API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引：1

作者：

Cui, Lei ^{[1
]}

Yin, Junnan ^{[1
]}

Cui, Jiancong ^{[2
]}

Ji, Yuede ^{[3
]}

Liu, Peng ^{[4
]}

Hao, Zhiyu ^{[1
]}

Yun, Xiaochun ^{[1
]}

机构：

[1] Zhongguancun Lab, Beijing 100093, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China

[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA

[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;

D O I：

10.1109/TSE.2024.3422990

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.

引用

页码：2142 / 2162

页数：21

共 50 条

[31] Boosting API Misuse Detection via Integrating API Constraints from Multiple Sources
Li, Can
Zhang, Jingxuan
Tang, Yixuan
Li, Zhuhang
Sun, Tianyue
2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 14 - 26
[32] Malware Classification Using Dynamically Extracted API Call Embeddings
Aggarwal, Sahil
Di Troia, Fabio
APPLIED SCIENCES-BASEL, 2024, 14 (13):
[33] Features Engineering for Malware Family Classification Based API Call
Daeef, Ammar Yahya
Al-Naji, Ali
Chahl, Javaan
COMPUTERS, 2022, 11 (11)
[34] A malware classification method based on directed API call relationships
Ma, Cuihua
Li, Zhenwan
Long, Haixia
Bilal, Anas
Liu, Xiaowen
PLOS ONE, 2025, 20 (03):
[35] Using API Calls for Sequence-Pattern Feature Mining-Based Malware Detection
Balan, Gheorghe
Gavrilut, Dragos Teodor
Luchian, Henri
INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2022, 2022, 13620 : 233 - 251
[36] A Malware-Detection Method Using Deep Learning to Fully Extract API Sequence Features
Zhang, Shuhui
Gao, Mingyu
Wang, Lianhai
Xu, Shujiang
Shao, Wei
Kuang, Ruixue
ELECTRONICS, 2025, 14 (01):
[37] Feature-Chain Based Malware Detection Using Multiple Sequence Alignment of API Call
Kim, Hyun-Joo
Kim, Jong-Hyun
Kim, Jung-Tai
Kim, Ik-Kyun
Chung, Tai-Myung
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1071 - 1080
[38] A Multi-Perspective malware detection approach through behavioral fusion of API call sequence
Amer, Eslam
Zelinka, Ivan
El-Sappagh, Shaker
COMPUTERS & SECURITY, 2021, 110
[39] A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence
Amer, Eslam
Zelinka, Ivan
COMPUTERS & SECURITY, 2020, 92
[40] Using feature generation from API calls for malware detection
Salehi, Zahra
Sami, Ashkan
Ghiasi, Mahboobe
Computer Fraud and Security, 2014, 2014 (09): : 9 - 18

← 1 2 3 4 5 →