BHMDC: A byte and hex n-gram based malware detection and classification method

被引:4
|
作者
Tang, Yonghe [1 ]
Qi, Xuyan [1 ]
Jing, Jing [1 ]
Liu, Chunling [1 ]
Dong, Weiyu [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Zhengzhou 450000, Peoples R China
关键词
Malware detection; Malware classification; Byte n-gram; Hex n-gram; Random forest; Light gradient boosting machine; MODEL;
D O I
10.1016/j.cose.2023.103118
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, malware and their variants have proliferated, which poses a grave threat to the systems and networks' security, so it is urgent to detect and classify malware in time to prevent the spread of malicious activities. However, the existing malware detection and classification methods can't meet the requirement of the application perfectly. Among them, machine learning-based approaches generally face the dilemma of balancing efficiency and accuracy due to imperfect feature representation, while deep learning-based methods are usually computationally intense to train and deploy. In order to solve the problem, we focus on improving the feature extraction and classification model, and propose a Byte and Hex n-gram based Malware Detection and Classification method called BHMDC in this paper. For mal-ware detection, LightGBM is used to detect malware with just 256-dimensional byte unigram features, which achieves an accuracy of more than 99.70% on two built datasets with less time consumption. For malware classification, block byte unigram and hex n-gram are proposed and combined together as the feature, which can preserve more properties and profile executable files in a multi-granular way, then random forest is used to optimize the feature by removing redundant information and reducing the di-mensionality, and LightGBM is finally utilized to identify malware families. The performance of the pro-posed approach is evaluated through experiments, and it is compared with state-of-the-art methods. The proposed approach produces 99.264% accuracy on Microsoft malware classification challenge dataset and 99.775% accuracy on Malimg dataset respectively, which substantially outperforms the other approaches. Promising experimental results reveal that BHMDC can be used in antivirus software to detect malware variants and help security analysts to identify malware families.(c) 2023 Published by Elsevier Ltd.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] An investigation of byte n-gram features for malware classification
    Raff, Edward
    Zak, Richard
    Cox, Russell
    Sylvester, Jared
    Yacci, Paul
    Ward, Rebecca
    Tracy, Anna
    McLean, Mark
    Nicholas, Charles
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (01): : 1 - 20
  • [2] Byte Level n-Gram Analysis for Malware Detection
    Jain, Sacbin
    Meena, Yogesb Kumar
    COMPUTER NETWORKS AND INTELLIGENT COMPUTING, 2011, 157 : 51 - 59
  • [3] N-gram Density based Malware Detection
    O'Kane, Philip
    Sezer, Sakir
    McLaughlin, Kieran
    2014 WORLD SYMPOSIUM ON COMPUTER APPLICATIONS & RESEARCH (WSCAR), 2014,
  • [4] Proposal of n-gram Based Algorithm for Malware Classification
    Pektas, Abdurrahman
    Eris, Mehmet
    Acarman, Tankut
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON EMERGING SECURITY INFORMATION, SYSTEMS AND TECHNOLOGIES (SECURWARE 2011), 2011, : 14 - 18
  • [5] Opcode n-gram based Malware Classification in Android
    Sihag, Vikas
    Mitharwal, Anita
    Vardhan, Manu
    Singh, Pradeep
    PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 645 - 650
  • [6] Investigation of The Latest Malware Detection Engines and Lightweight Byte n-gram Methods with Real Custom Malware
    Uda, Ryuya
    Araki, Shinnosuke
    2024 16TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, ICCAE 2024, 2024, : 6 - 11
  • [7] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [8] Automatic malware mutant detection and group classification based on the n-gram and clustering coefficient
    Lee, Taejin
    Choi, Bomin
    Shin, Youngsang
    Kwak, Jin
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (08): : 3489 - 3503
  • [9] Automatic malware mutant detection and group classification based on the n-gram and clustering coefficient
    Taejin Lee
    Bomin Choi
    Youngsang Shin
    Jin Kwak
    The Journal of Supercomputing, 2018, 74 : 3489 - 3503
  • [10] Hash-Grams: Faster N-Gram Features for Classification and Malware Detection
    Raff, Edward
    Nicholas, Charles
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,