API2Vec++: Boosting API Sequence Representation for Malware Detection and Classification

被引:1
|
作者
Cui, Lei [1 ]
Yin, Junnan [1 ]
Cui, Jiancong [2 ]
Ji, Yuede [3 ]
Liu, Peng [4 ]
Hao, Zhiyu [1 ]
Yun, Xiaochun [1 ]
机构
[1] Zhongguancun Lab, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100093, Peoples R China
[3] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA
[4] Guangxi Normal Univ, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Logic; Legged locomotion; Task analysis; Feature extraction; Encoding; Runtime; Malware detection; malware classification; path embedding; BERT; random walk;
D O I
10.1109/TSE.2024.3422990
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Analyzing malware based on API call sequences is an effective approach, as these sequences reflect the dynamic execution behavior of malware. Recent advancements in deep learning have facilitated the application of these techniques to mine valuable information from API call sequences. However, these methods typically operate on raw sequences and may not effectively capture crucial information, especially in the case of multi-process malware, due to the API call interleaving problem. Furthermore, they often fail to capture contextual behaviors within or across processes, which is particularly important for identifying and classifying malicious activities. Motivated by this, we present API2Vec++, a graph-based API embedding method for malware detection and classification. First, we construct a graph model to represent the raw sequence. Specifically, we design the Temporal Process Graph (TPG) to model inter-process behaviors and the Temporal API Property Graph (TAPG) to model intra-process behaviors. Compared to our previous graph model, the TAPG model exposes operations with associated behaviors within the process through node properties and thus enhances detection and classification abilities. Using these graphs, we develop a heuristic random walk algorithm to generate numerous paths that can capture fine-grained malicious familial behavior. By pre-training these paths using the BERT model, we generate embeddings of paths and APIs, which can then be used for malware detection and classification. Experiments on a real-world malware dataset demonstrate that API2Vec++ outperforms state-of-the-art embedding methods and detection/classification methods in both accuracy and robustness, particularly for multi-process malware.
引用
收藏
页码:2142 / 2162
页数:21
相关论文
共 50 条
  • [41] A Novel Approach to Detect Malware Based on API Call Sequence Analysis
    Ki, Youngjoon
    Kim, Eunjin
    Kim, Huy Kang
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2015,
  • [42] Genetic Boosting Classification for Malware Detection
    Martin, Alejandro
    Menendez, Hector D.
    Camacho, David
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 1030 - 1037
  • [43] Lightweight and Robust Malware Detection Using Dictionaries of API Calls
    Daeef, Ammar Yahya
    Al-Naji, Ali
    Chahl, Javaan
    TELECOM, 2023, 4 (04): : 746 - 757
  • [44] EAODroid: Android Malware Detection Based on Enhanced API Order
    Huang Lu
    Xue Jingfeng
    Wang Yong
    Qu Dacheng
    Chen Junbao
    Zhang Nan
    Zhang Li
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (05) : 1169 - 1178
  • [45] EAODroid: Android Malware Detection Based on Enhanced API Order
    HUANG Lu
    XUE Jingfeng
    WANG Yong
    QU Dacheng
    CHEN Junbao
    ZHANG Nan
    ZHANG Li
    Chinese Journal of Electronics, 2023, 32 (05) : 1169 - 1178
  • [46] API Call and Permission Based Mobile Malware Detection (In English)
    Aysin, Ahmet Ilhan
    Sen, Sevil
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2400 - 2403
  • [47] STATIC DETECTION OF ANDROID MALWARE BY USING PERMISSIONS AND API CALLS
    Chan, Patrick P. K.
    Song, Wen-Kai
    PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2014, : 82 - 87
  • [48] IntDroid: Android Malware Detection Based on API Intimacy Analysis
    Zou, Deqing
    Wu, Yueming
    Yang, Siru
    Chauhan, Anki
    Yang, Wei
    Zhong, Jiangying
    Dou, Shihan
    Jin, Hai
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [49] A Review Paper of Malware Detection Using API Call Sequences
    Mira, Fahad
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS), 2019,
  • [50] A novel Android malware detection method with API semantics extraction
    Yang, Hongyu
    Wang, Youwei
    Zhang, Liang
    Cheng, Xiang
    Hu, Ze
    COMPUTERS & SECURITY, 2024, 137