Deep Hashing for Malware Family Classification and New Malware Identification

被引:2
|
作者
Zhang, Yunchun [1 ]
Liao, Zikun [1 ]
Zhang, Ning [1 ]
Min, Shaohui [1 ]
Wang, Qi [1 ]
Quek, Tony Q. S. [2 ]
Zhao, Mingxiong [1 ]
机构
[1] Yunnan Univ, Engn Res Ctr Cyberspace, Natl Pilot Sch Software, Kunming 650500, Peoples R China
[2] Singapore Univ Technol & Design, Informat Syst Technol & Design, Singapore 487372, Singapore
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 16期
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Image retrieval; Image classification; Artificial neural networks; Internet of Things; Semantics; Deep hashing; deep neural networks (DNNs); image retrieval; malware classification; malware images; SEMANTICS; NETWORK;
D O I
10.1109/JIOT.2024.3353250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although numerous state-of-the-art deep neural networks have recently been proposed for malware classification, effectively detecting malware on a large-scale sample set and identifying zero-day or new malware variants still pose significant challenges. To address this issue, a deep hashing-based malware classification model is designed for malware identification, including two parts: 1) ResNet50-based deep hashing for malware retrieval and 2) voting-based malware classification. Specifically, multiple deep hashing models are developed by extracting the high-layer outputs (feature maps) from the ResNet50 trained with malware gray-scale images in the first part. In this case, to maximize the Hamming distance or dissimilarity among hash values computed with malware samples under different families, a ResNet50-based deep polarized network (RNDPN) is designed to return Top K similar samples. In the second part, we propose a majority-voting and a Hamming-distance-based voting for malware identification according to the retrieved results. The experiment results show that RNDPN outperforms the other six deep hashing models with 97.54% mean average precision (mAP) for malware retrieval when only 40 similar examples are retrieved, where the best results for all deep hashing models are observed with 48-bits hashing code length. Furthermore, the Hamming distance-based voting method implemented with RNDPN demonstrates unparalleled performance in malware classification compared to other models. Notably, it achieves exceptional results in two key aspects: 1) malware classification accuracy with an impressive accuracy rate of 96.5% and 2) the identification of new or zero-day malware with a commendable accuracy of 85.7%.
引用
收藏
页码:26837 / 26851
页数:15
相关论文
共 50 条
  • [1] Android Malware Classification Based on Fuzzy Hashing Visualization
    Rodriguez-Bazan, Horacio
    Sidorov, Grigori
    Escamilla-Ambrosio, Ponciano Jorge
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2023, 5 (04): : 1826 - 1847
  • [2] A New Malware Classification Approach Based on Malware Dynamic Analysis
    Fang, Ying
    Yu, Bo
    Tang, Yong
    Liu, Liu
    Lu, Zexin
    Wang, Yi
    Yang, Qiang
    INFORMATION SECURITY AND PRIVACY, ACISP 2017, PT II, 2017, 10343 : 173 - 189
  • [3] Deep Android Malware Detection and Classification
    Vinayakumar, R.
    Soman, K. P.
    Poornachandran, Prabaharan
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1677 - 1683
  • [4] A Deep Learning Framework for Malware Classification
    Kalash, Mahmoud
    Rochan, Mrigank
    Mohammed, Noman
    Bruce, Neil
    Wang, Yang
    Iqbal, Farkhund
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2020, 12 (01) : 90 - 108
  • [5] MALWARE CLASSIFICATION USING DEEP LEARNING
    Lo, Cheng-Hsiang
    Liu, Ta-Che
    Liu, I-Hsien
    Li, Jung-Shian
    Liu, Chuan-Gang
    Li, Chu-Fen
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2020), 2020, : 126 - 129
  • [6] Android malware family classification based on deep learning of code images
    Sun, Yuxia
    Chen, Yanjia
    Pan, Yuchang
    Wu, Lingyu
    IAENG International Journal of Computer Science, 2019, 46 (04) : 1 - 10
  • [7] DATA AUGMENTATION IN TRAINING DEEP LEARNING MODELS FOR MALWARE FAMILY CLASSIFICATION
    Ding Yuxin
    Wang Guangbin
    Ma Yubin
    Ding Haoxuan
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2021, : 102 - 107
  • [8] MalFamAware: automatic family identification and malware classification through online clustering
    Pitolli, Gregorio
    Laurenza, Giuseppe
    Aniello, Leonardo
    Querzoni, Leonardo
    Baldoni, Roberto
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2021, 20 (03) : 371 - 386
  • [9] MalFamAware: automatic family identification and malware classification through online clustering
    Gregorio Pitolli
    Giuseppe Laurenza
    Leonardo Aniello
    Leonardo Querzoni
    Roberto Baldoni
    International Journal of Information Security, 2021, 20 : 371 - 386
  • [10] A New Malware Classification Framework Based on Deep Learning Algorithms
    Aslan, Omer
    Yilmaz, Abdullah Asim
    IEEE ACCESS, 2021, 9 : 87936 - 87951