API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引:1
|
作者
Cui, Lei [1 ]
Yin, Junnan [1 ]
Cui, Jiancong [2 ]
Ji, Yuede [3 ]
Liu, Peng [4 ]
Hao, Zhiyu [1 ]
Yun, Xiaochun [1 ]
机构
[1] Zhongguancun Lab, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China
[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA
[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;
D O I
10.1109/TSE.2024.3422990
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.
引用
收藏
页码:2142 / 2162
页数:21
相关论文
共 50 条
  • [21] MINES: Multi-perspective API Call Sequence Behavior Fusion Malware Classification
    Gao, Mohan
    Wu, Peng
    Pan, Li
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT IV, 2024, 14853 : 210 - 220
  • [22] Multi-perspective API call sequence behavior analysis and fusion for malware classification
    Wu, Peng
    Gao, Mohan
    Sun, Fuhui
    Wang, Xiaoyan
    Pan, Li
    COMPUTERS & SECURITY, 2025, 148
  • [23] Malware classification based on API calls and behaviour analysis
    Pektas, Abdurrahman
    Acarman, Tankut
    IET INFORMATION SECURITY, 2018, 12 (02) : 107 - 117
  • [24] SBRT: API Signature Behaviour Based Representation Technique for Improving Metamorphic Malware Detection
    Mohamed, Gamal A. N.
    Ithnin, Norafida Bte
    RECENT TRENDS IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2018, 5 : 767 - 777
  • [25] A novel deep framework for dynamic malware detection based on API sequence intrinsic features
    Li, Ce
    Lv, Qiujian
    Li, Ning
    Wang, Yan
    Sun, Degang
    Qiao, Yuanyuan
    COMPUTERS & SECURITY, 2022, 116
  • [26] MalEXLNet:A semantic analysis and detection method of malware API sequence based on EXLNet model
    Mao, Xuedong
    Zhao, Yuntao
    Feng, Yongxin
    Hu, Yutao
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (10): : 3060 - 3083
  • [27] Mining API Calls and Permissions for Android Malware Detection
    Sharma, Akanksha
    Dash, Subrat Kumar
    CRYPTOLOGY AND NETWORK SECURITY, CANS 2014, 2014, 8813 : 191 - 205
  • [28] Dynamic Malware Analysis Based on API Sequence Semantic Fusion
    Zhang, Sanfeng
    Wu, Jiahao
    Zhang, Mengzhe
    Yang, Wang
    APPLIED SCIENCES-BASEL, 2023, 13 (11):
  • [29] Merging Permission and API Features for Android Malware Detection
    Qiao, Mengyu
    Sung, Andrew H.
    Liu, Qingzhong
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 566 - 571
  • [30] Malware detection using assembly and API call sequences
    Shankarapani, Madhu K.
    Ramamoorthy, Subbu
    Movva, Ram S.
    Mukkamala, Srinivas
    JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2011, 7 (02): : 107 - 119