Deep Hashing for Malware Family Classification and New Malware Identification

被引:2
|
作者
Zhang, Yunchun [1 ]
Liao, Zikun [1 ]
Zhang, Ning [1 ]
Min, Shaohui [1 ]
Wang, Qi [1 ]
Quek, Tony Q. S. [2 ]
Zhao, Mingxiong [1 ]
机构
[1] Yunnan Univ, Engn Res Ctr Cyberspace, Natl Pilot Sch Software, Kunming 650500, Peoples R China
[2] Singapore Univ Technol & Design, Informat Syst Technol & Design, Singapore 487372, Singapore
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 16期
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Image retrieval; Image classification; Artificial neural networks; Internet of Things; Semantics; Deep hashing; deep neural networks (DNNs); image retrieval; malware classification; malware images; SEMANTICS; NETWORK;
D O I
10.1109/JIOT.2024.3353250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although numerous state-of-the-art deep neural networks have recently been proposed for malware classification, effectively detecting malware on a large-scale sample set and identifying zero-day or new malware variants still pose significant challenges. To address this issue, a deep hashing-based malware classification model is designed for malware identification, including two parts: 1) ResNet50-based deep hashing for malware retrieval and 2) voting-based malware classification. Specifically, multiple deep hashing models are developed by extracting the high-layer outputs (feature maps) from the ResNet50 trained with malware gray-scale images in the first part. In this case, to maximize the Hamming distance or dissimilarity among hash values computed with malware samples under different families, a ResNet50-based deep polarized network (RNDPN) is designed to return Top K similar samples. In the second part, we propose a majority-voting and a Hamming-distance-based voting for malware identification according to the retrieved results. The experiment results show that RNDPN outperforms the other six deep hashing models with 97.54% mean average precision (mAP) for malware retrieval when only 40 similar examples are retrieved, where the best results for all deep hashing models are observed with 48-bits hashing code length. Furthermore, the Hamming distance-based voting method implemented with RNDPN demonstrates unparalleled performance in malware classification compared to other models. Notably, it achieves exceptional results in two key aspects: 1) malware classification accuracy with an impressive accuracy rate of 96.5% and 2) the identification of new or zero-day malware with a commendable accuracy of 85.7%.
引用
收藏
页码:26837 / 26851
页数:15
相关论文
共 50 条
  • [21] Malware Family Classification using LSTM with Attention
    Xie, Qi
    Wang, Yongjun
    Qin, Zhiquan
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 966 - 970
  • [22] Malware Classification with Deep Convolutional Neural Networks
    Kalash, Mahmoud
    Rochan, Mrigank
    Mohammed, Noman
    Bruce, Neil D. B.
    Wang, Yang
    Iqbal, Farkhund
    2018 9TH IFIP INTERNATIONAL CONFERENCE ON NEW TECHNOLOGIES, MOBILITY AND SECURITY (NTMS), 2018,
  • [23] Deep Learning Framework and Visualization for Malware Classification
    Akarsh, S.
    Simran, K.
    Poornachandran, Prabaharan
    Menon, Vijay Krishna
    Soman, K. P.
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 1059 - 1063
  • [24] Malware Classification Using Deep Learning Methods
    Cakir, Bugra
    Dogdu, Erdogan
    ACMSE '18: PROCEEDINGS OF THE ACMSE 2018 CONFERENCE, 2018,
  • [25] Towards an interpretable deep learning model for mobile malware detection and family identification
    Iadarola, Giacomo
    Martinelli, Fabio
    Mercaldo, Francesco
    Santone, Antonella
    COMPUTERS & SECURITY, 2021, 105
  • [26] A Modified ResNeXt for Android Malware Identification and Classification
    Albahar, Marwan Ali
    ElSayed, Mahmoud Said
    Jurcut, Anca
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [27] Using String Information for Malware Family Identification
    Shrestha, Prasha
    Maharjan, Suraj
    de la Rosa, Gabriela Ramirez
    Sprague, Alan
    Solorio, Thamar
    Warner, Gary
    ADVANCES IN ARTIFICIAL INTELLIGENCE (IBERAMIA 2014), 2014, 8864 : 686 - 697
  • [28] Malware-on-the-Brain: Illuminating Malware Byte Codes With Images for Malware Classification
    Zhong, Fangtian
    Chen, Zekai
    Xu, Minghui
    Zhang, Guoming
    Yu, Dongxiao
    Cheng, Xiuzhen
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (02) : 438 - 451
  • [29] Malware Behavior Image for Malware Variant Identification
    Shaid, Syed Zainudeen Mohd
    Maarof, Mohd Aizaini
    2014 INTERNATIONAL SYMPOSIUM ON BIOMETRICS AND SECURITY TECHNOLOGIES (ISBAST), 2014, : 238 - 243
  • [30] A Multi-Dimensional Deep Learning Framework for IoT Malware Classification and Family Attribution
    Dib, Mirabelle
    Torabi, Sadegh
    Bou-Harb, Elias
    Assi, Chadi
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (02): : 1165 - 1177