Approach for the Optimization of Machine Learning Models for Calculating Binary Function Similarity

被引:0
|
作者
Horimoto, Suguru [1 ,2 ]
Lucas, Keane [3 ]
Bauer, Lujo [3 ]
机构
[1] Natl Police Agcy, Tokyo, Japan
[2] Carnegie Mellon Univ, CyLab Secur & Privacy Inst, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Malware analysis; Graph learning; Similarity;
D O I
10.1007/978-3-031-64171-8_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary function similarity comparison is essential in a variety of security fields, such as software vulnerability detection and malware analysis, because it enables engineers to accelerate otherwise time-consuming tasks. While various approaches for binary function similarity comparison have been proposed, in an experiment of previous work to fairly evaluate existing methods, a method combining graph neural network (GNN) and bag-of-words (BoW) exhibited the highest performance. In this method, each basic block (BB) in a function is embedded into a vector by BoW. As a result, the function vector is derived from sparse vectors. In this paper, we propose a method combining a GNN with fastText, instead of BoW. Furthermore, in order to optimize machine learning models for calculating binary function similarity, we apply early stopping based on mean reciprocal rank (MRR) to our machine learning training. Our method outperformed the previous method combining GNN and BoW by up to 2% in AUC, up to 9% in Recall@1 and up to 7% in MRR10 in a certain case. Additionally, through a function search case study in malware analysis, our method has been found to be applicable for finding distinctive functions present in LockBit Ransomware.
引用
收藏
页码:309 / 329
页数:21
相关论文
共 50 条
  • [21] Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization
    Jia Wu
    Xiu-Yun Chen
    Hao Zhang
    Li-Dong Xiong
    Hang Lei
    Si-Hao Deng
    Journal of Electronic Science and Technology, 2019, (01) : 26 - 40
  • [22] Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization
    Jia Wu
    XiuYun Chen
    Hao Zhang
    LiDong Xiong
    Hang Lei
    SiHao Deng
    Journal of Electronic Science and Technology, 2019, 17 (01) : 26 - 40
  • [23] A Novel Machine Learning Approach Combined with Optimization Models for Eco-efficiency Evaluation
    Mirmozaffari, Mirpouya
    Yazdani, Maziar
    Boskabadi, Azam
    Dolatsara, Hamidreza Ahady
    Kabirifar, Kamyar
    Golilarz, Noorbakhsh Amiri
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [24] Function Representations for Binary Similarity
    Massarelli, Luca
    Di Luna, Giuseppe Antonio
    Petroni, Fabio
    Querzoni, Leonardo
    Baldoni, Roberto
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (04) : 2259 - 2273
  • [25] Hyperparameter optimization for machine learning models based on Bayesian optimization
    Wu J.
    Chen X.-Y.
    Zhang H.
    Xiong L.-D.
    Lei H.
    Deng S.-H.
    Journal of Electronic Science and Technology, 2019, 17 (01) : 26 - 40
  • [26] Cancer data classification using binary bat optimization and extreme learning machine with a novel fitness function
    Chatra, Kaveri
    Kuppili, Venkatanareshbabu
    Edla, Damodar Reddy
    Verma, Ajeet Kumar
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2019, 57 (12) : 2673 - 2682
  • [27] Cancer data classification using binary bat optimization and extreme learning machine with a novel fitness function
    Kaveri Chatra
    Venkatanareshbabu Kuppili
    Damodar Reddy Edla
    Ajeet Kumar Verma
    Medical & Biological Engineering & Computing, 2019, 57 : 2673 - 2682
  • [28] Similarity detection among data files - A machine learning approach
    Dash, M
    Liu, H
    1997 IEEE KNOWLEDGE AND DATA ENGINEERING EXCHANGE WORKSHOP, PROCEEDINGS, 1997, : 172 - 179
  • [29] Similarity Ranking as an Attribute for Machine Learning Approach to Authorship Identification
    Rygl, Jan
    Horak, Ales
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 726 - 729
  • [30] An optimization algorithm guided by a machine learning approach
    Cuevas, Erik
    Galvez, Jorge
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 2963 - 2991