API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引:1
|
作者
Cui, Lei [1 ]
Yin, Junnan [1 ]
Cui, Jiancong [2 ]
Ji, Yuede [3 ]
Liu, Peng [4 ]
Hao, Zhiyu [1 ]
Yun, Xiaochun [1 ]
机构
[1] Zhongguancun Lab, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China
[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA
[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;
D O I
10.1109/TSE.2024.3422990
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.
引用
收藏
页码:2142 / 2162
页数:21
相关论文
共 50 条
  • [31] Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources
    Li, Can
    Zhang, Jingxuan
    Tang, Yixuan
    Li, Zhuhang
    Sun, Tianyue
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 14 - 26
  • [32] Malware Classification Using Dynamically Extracted API Call Embeddings
    Aggarwal, Sahil
    Di Troia, Fabio
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [33] Features Engineering for Malware Family Classification Based API Call
    Daeef, Ammar Yahya
    Al-Naji, Ali
    Chahl, Javaan
    COMPUTERS, 2022, 11 (11)
  • [34] A malware classification method based on directed API call relationships
    Ma, Cuihua
    Li, Zhenwan
    Long, Haixia
    Bilal, Anas
    Liu, Xiaowen
    PLOS ONE, 2025, 20 (03):
  • [35] Using API Calls for Sequence-Pattern Feature Mining-Based Malware Detection
    Balan, Gheorghe
    Gavrilut, Dragos Teodor
    Luchian, Henri
    INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2022, 2022, 13620 : 233 - 251
  • [36] A Malware-Detection Method Using Deep Learning to Fully Extract API Sequence Features
    Zhang, Shuhui
    Gao, Mingyu
    Wang, Lianhai
    Xu, Shujiang
    Shao, Wei
    Kuang, Ruixue
    ELECTRONICS, 2025, 14 (01):
  • [37] Feature-Chain Based Malware Detection Using Multiple Sequence Alignment of API Call
    Kim, Hyun-Joo
    Kim, Jong-Hyun
    Kim, Jung-Tai
    Kim, Ik-Kyun
    Chung, Tai-Myung
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1071 - 1080
  • [38] A Multi-Perspective malware detection approach through behavioral fusion of API call sequence
    Amer, Eslam
    Zelinka, Ivan
    El-Sappagh, Shaker
    COMPUTERS & SECURITY, 2021, 110
  • [39] A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence
    Amer, Eslam
    Zelinka, Ivan
    COMPUTERS & SECURITY, 2020, 92
  • [40] Using feature generation from API calls for malware detection
    Salehi, Zahra
    Sami, Ashkan
    Ghiasi, Mahboobe
    Computer Fraud and Security, 2014, 2014 (09): : 9 - 18