API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引：1

作者：

Cui, Lei ^{[1
]}

Yin, Junnan ^{[1
]}

Cui, Jiancong ^{[2
]}

Ji, Yuede ^{[3
]}

Liu, Peng ^{[4
]}

Hao, Zhiyu ^{[1
]}

Yun, Xiaochun ^{[1
]}

机构：

[1] Zhongguancun Lab, Beijing 100093, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China

[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA

[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;

D O I：

10.1109/TSE.2024.3422990

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.

引用

页码：2142 / 2162

页数：21

共 50 条

[1] API2Vec: Learning Representations of API Sequences for Malware Detection
Cui, Lei
Cui, Jiancong
Ji, Yuede
Hao, Zhiyu
Li, Lun
Ding, Zhenquan
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 261 - 273
[2] Dynamic API call sequence visualisation for malware classification
Tang, Mingdong
Qian, Quan
IET INFORMATION SECURITY, 2019, 13 (04) : 367 - 377
[3] Improvement of malware detection and classification using API call sequence alignment and visualization
Hyunjoo Kim
Jonghyun Kim
Youngsoo Kim
Ikkyun Kim
Kuinam J. Kim
Hyuncheol Kim
Cluster Computing, 2019, 22 : 921 - 929
[4] Improvement of malware detection and classification using API call sequence alignment and visualization
Kim, Hyunjoo
Kim, Jonghyun
Kim, Youngsoo
Kim, Ikkyun
Kim, Kuinam J.
Kim, Hyuncheol
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 921 - 929
[5] API-based features representation fusion for malware classification
Belkhouche, Yassine
2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1658 - 1662
[6] Malware Detection and Classification Based on Extraction of API Sequences
Uppal, Dolly
Sinha, Rakhi
Mehra, Vishakha
Jain, Vinesh
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2337 - 2342
[7] Malware detection based on visualization of recombined API instruction sequence
Yang, Hongyu
Zhang, Yupei
Zhang, Liang
Cheng, Xiang
CONNECTION SCIENCE, 2022, 34 (01) : 2630 - 2651
[8] ASSCA: API sequence and statistics features combined architecture for malware detection
Lu Xiaofeng
Jiang Fangshuo
Zhou Xiao
Yi Shengwei
Sha Jing
Lio, Pietro
COMPUTER NETWORKS, 2019, 157 : 99 - 111
[9] A novel malware detection method based on API embedding and API parameters
Bo Zhou
Hai Huang
Jun Xia
Donghai Tian
The Journal of Supercomputing, 2024, 80 : 2748 - 2766
[10] System API Vectorization for Malware Detection
Shin, Kyounga
Lee, Yunho
Lim, Jungho
Kang, Honggoo
Lee, Sangjin
IEEE ACCESS, 2023, 11 : 53788 - 53805

← 1 2 3 4 5 →