API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引:1
|
作者
Cui, Lei [1 ]
Yin, Junnan [1 ]
Cui, Jiancong [2 ]
Ji, Yuede [3 ]
Liu, Peng [4 ]
Hao, Zhiyu [1 ]
Yun, Xiaochun [1 ]
机构
[1] Zhongguancun Lab, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China
[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA
[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;
D O I
10.1109/TSE.2024.3422990
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.
引用
收藏
页码:2142 / 2162
页数:21
相关论文
共 50 条
  • [1] API2Vec: Learning Representations of API Sequences for Malware Detection
    Cui, Lei
    Cui, Jiancong
    Ji, Yuede
    Hao, Zhiyu
    Li, Lun
    Ding, Zhenquan
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 261 - 273
  • [2] Dynamic API call sequence visualisation for malware classification
    Tang, Mingdong
    Qian, Quan
    IET INFORMATION SECURITY, 2019, 13 (04) : 367 - 377
  • [3] Improvement of malware detection and classification using API call sequence alignment and visualization
    Hyunjoo Kim
    Jonghyun Kim
    Youngsoo Kim
    Ikkyun Kim
    Kuinam J. Kim
    Hyuncheol Kim
    Cluster Computing, 2019, 22 : 921 - 929
  • [4] Improvement of malware detection and classification using API call sequence alignment and visualization
    Kim, Hyunjoo
    Kim, Jonghyun
    Kim, Youngsoo
    Kim, Ikkyun
    Kim, Kuinam J.
    Kim, Hyuncheol
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 921 - 929
  • [5] API-based features representation fusion for malware classification
    Belkhouche, Yassine
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 1658 - 1662
  • [6] Malware Detection and Classification Based on Extraction of API Sequences
    Uppal, Dolly
    Sinha, Rakhi
    Mehra, Vishakha
    Jain, Vinesh
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2337 - 2342
  • [7] Malware detection based on visualization of recombined API instruction sequence
    Yang, Hongyu
    Zhang, Yupei
    Zhang, Liang
    Cheng, Xiang
    CONNECTION SCIENCE, 2022, 34 (01) : 2630 - 2651
  • [8] ASSCA: API sequence and statistics features combined architecture for malware detection
    Lu Xiaofeng
    Jiang Fangshuo
    Zhou Xiao
    Yi Shengwei
    Sha Jing
    Lio, Pietro
    COMPUTER NETWORKS, 2019, 157 : 99 - 111
  • [9] A novel malware detection method based on API embedding and API parameters
    Bo Zhou
    Hai Huang
    Jun Xia
    Donghai Tian
    The Journal of Supercomputing, 2024, 80 : 2748 - 2766
  • [10] System API Vectorization for Malware Detection
    Shin, Kyounga
    Lee, Yunho
    Lim, Jungho
    Kang, Honggoo
    Lee, Sangjin
    IEEE ACCESS, 2023, 11 : 53788 - 53805