A malware detection model based on imbalanced heterogeneous graph embeddings

被引:9
|
作者
Li, Tun [1 ]
Luo, Ya [1 ]
Wan, Xin [1 ]
Li, Qian [1 ]
Liu, Qilie [1 ]
Wang, Rong [1 ]
Jia, Chaolong [1 ]
Xiao, Yunpeng [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Imbalanced networks; Generative adversarial networks; Heterogeneous graph; Representation learning; NEURAL-NETWORKS; SMOTE;
D O I
10.1016/j.eswa.2023.123109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of malware in recent years has posed a significant threat to the security of computers and mobile devices. Detecting malware, especially on the Android platform, has become a growing concern for researchers and the software industry. This paper proposes a new method for detecting Android malware based on unbalanced heterogeneous graph embedding. First of all, most malware datasets contain an imbalance of malicious and benign samples, since some types of malware are scarce and difficult to collect. Thus, as a result of this problem, the classification algorithm is unable to analyze the minority samples through sufficient data, resulting in poor downstream classifier performance, in light of the fact that adversarial generation networks possess the characteristic of completing data, an algorithm for generating graph structure data is presented, in which nodes are generated to simulate the distribution of minority nodes within a network topology. Then, considering that heterogeneous information networks have the characteristics of retaining rich node semantic features and mining implicit relationships, heterogeneous graphs are used to construct models for different types of entities (i.e. Apps, APIs, permissions, intents, etc.) and different meta-paths. Finally, a new method is introduced to alleviate the over-smoothing phenomenon of node information in the propagation of deep network. In the deep GCN, we first sample the leader nodes of each layer node, and then add a residual connection and an identity map in order to determine the characteristics of the high-order leader. In this paper, a self-attention-based semantic fusion method is also applied to adaptively fuse embedded representations of software nodes under different meta-paths. The test results demonstrate that the proposed IHODroid model effectively detects malicious software. In the DREBIN dataset, which consists of 123,453 Android applications and 5,560 malicious samples, the IHODroid model achieves an accuracy of 0.9360 and an F1 score of 0.9360, outperforming other state-of-the-art baseline methods.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A fast malware detection model based on heterogeneous graph similarity search
    Li, Tun
    Shou, Peng
    Wan, Xin
    Li, Qian
    Wang, Rong
    Jia, Chaolong
    Xiao, Yunpeng
    COMPUTER NETWORKS, 2024, 254
  • [2] WHGDroid: Effective android malware detection based on weighted heterogeneous graph
    Huang, Lu
    Xue, Jingfeng
    Wang, Yong
    Liu, Zhenyan
    Chen, Junbao
    Kong, Zixiao
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2023, 77
  • [3] GHGDroid: Global heterogeneous graph-based android malware detection
    Shen, Lina
    Fang, Mengqi
    Xu, Jian
    COMPUTERS & SECURITY, 2024, 141
  • [4] Heterogeneous Graph Matching Networks for Unknown Malware Detection
    Wang, Shen
    Chen, Zhengzhang
    Yu, Xiao
    Li, Ding
    Ni, Jingchao
    Tang, Lu-An
    Gui, Jiaping
    Li, Zhichun
    Chen, Haifeng
    Yu, Philip S.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3762 - 3770
  • [5] αCyber: Enhancing Robustness of Android Malware Detection System against Adversarial Attacks on Heterogeneous Graph based Model
    Hou, Shifu
    Fan, Yujie
    Zhang, Yiming
    Ye, Yanfang
    Lei, Jingwei
    Wan, Wenqiang
    Wang, Jiabin
    Xiong, Qi
    Shao, Fudong
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 609 - 618
  • [6] Malware Detection based on Graph Classification
    Khanh-Huu-The Dam
    Touili, Tayssir
    ICISSP: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2017, : 455 - 463
  • [7] An Android Malware Detection Framework Using Graph Embeddings and Convolutional Neural Networks
    Gibert, Daniel
    Lamas, Alba
    Martins, Ruben
    Mateu, Caries
    Planes, Jordi
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2019, 319 : 45 - 53
  • [8] Heterogeneous Graph Matching Networks: Application to Unknown Malware Detection
    Wang, Shen
    Yu, Philip S.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5401 - 5408
  • [9] Z2F: Heterogeneous graph-based Android malware detection
    Ma, Ziwei
    Luktarhan, Nurbor
    PLOS ONE, 2024, 19 (03):
  • [10] Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs
    Gunduz H.
    PeerJ Computer Science, 2022, 8