BHMDC: A byte and hex n-gram based malware detection and classification method

被引:4
|
作者
Tang, Yonghe [1 ]
Qi, Xuyan [1 ]
Jing, Jing [1 ]
Liu, Chunling [1 ]
Dong, Weiyu [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Zhengzhou 450000, Peoples R China
关键词
Malware detection; Malware classification; Byte n-gram; Hex n-gram; Random forest; Light gradient boosting machine; MODEL;
D O I
10.1016/j.cose.2023.103118
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, malware and their variants have proliferated, which poses a grave threat to the systems and networks' security, so it is urgent to detect and classify malware in time to prevent the spread of malicious activities. However, the existing malware detection and classification methods can't meet the requirement of the application perfectly. Among them, machine learning-based approaches generally face the dilemma of balancing efficiency and accuracy due to imperfect feature representation, while deep learning-based methods are usually computationally intense to train and deploy. In order to solve the problem, we focus on improving the feature extraction and classification model, and propose a Byte and Hex n-gram based Malware Detection and Classification method called BHMDC in this paper. For mal-ware detection, LightGBM is used to detect malware with just 256-dimensional byte unigram features, which achieves an accuracy of more than 99.70% on two built datasets with less time consumption. For malware classification, block byte unigram and hex n-gram are proposed and combined together as the feature, which can preserve more properties and profile executable files in a multi-granular way, then random forest is used to optimize the feature by removing redundant information and reducing the di-mensionality, and LightGBM is finally utilized to identify malware families. The performance of the pro-posed approach is evaluated through experiments, and it is compared with state-of-the-art methods. The proposed approach produces 99.264% accuracy on Microsoft malware classification challenge dataset and 99.775% accuracy on Malimg dataset respectively, which substantially outperforms the other approaches. Promising experimental results reveal that BHMDC can be used in antivirus software to detect malware variants and help security analysts to identify malware families.(c) 2023 Published by Elsevier Ltd.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] MDBA: Detecting Malware based on Bytes N-Gram with Association Mining
    Li, Bowei
    Zhang, Yongzheng
    Yao, Junliang
    Yin, Tao
    2019 26TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS (ICT), 2019, : 227 - 232
  • [22] Combat Mobile Malware via N-gram Based Deep Learning
    Dusun, Burak
    Bulut, Irfan
    Aygun, R. Can
    Yavuz, A. Gokhan
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [23] MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System
    Ali, Muhammad
    Shiaeles, Stavros
    Bendiab, Gueltoum
    Ghita, Bogdan
    ELECTRONICS, 2020, 9 (11) : 1 - 20
  • [24] A discriminative method for protein remote homology detection based on N-Gram
    Xie, S.
    Li, P.
    Jiang, Y.
    Zhao, Y.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (01): : 69 - 78
  • [25] A Malware Variant Detection Method Based on Byte Randomness Test
    Qi, Shuhui
    Xu, Ming
    Zheng, Ning
    JOURNAL OF COMPUTERS, 2013, 8 (10) : 2469 - 2477
  • [26] Classification of documents based on contents using the n-gram method of MNB model
    Najim, Junaina Jamil
    AL-Bayati, Aldin
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 17 - 21
  • [27] Classification of facemarks using N-gram
    Yamada, Thichi
    Tsuchiya, Seiji
    Kuroiwa, Shiongo
    Ren, Fuji
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 322 - +
  • [28] Distributing N-Gram Graphs for Classification
    Kontopoulos, Ioannis
    Giannakopoulos, George
    Varlamis, Iraklis
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 3 - 11
  • [29] N-Gram Based Secure Similar Document Detection
    Jiang, Wei
    Samanthula, Bharath K.
    DATA AND APPLICATIONS SECURITY AND PRIVACY XXV, 2011, 6818 : 239 - 246
  • [30] Word N-gram Based Classification for Data Leakage Prevention
    Alneyadi, Sultan
    Sithirasenan, Elankayer
    Muthukkumarasamy, Vallipuram
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 578 - 585