Approach for the Optimization of Machine Learning Models for Calculating Binary Function Similarity

被引:0
|
作者
Horimoto, Suguru [1 ,2 ]
Lucas, Keane [3 ]
Bauer, Lujo [3 ]
机构
[1] Natl Police Agcy, Tokyo, Japan
[2] Carnegie Mellon Univ, CyLab Secur & Privacy Inst, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Malware analysis; Graph learning; Similarity;
D O I
10.1007/978-3-031-64171-8_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary function similarity comparison is essential in a variety of security fields, such as software vulnerability detection and malware analysis, because it enables engineers to accelerate otherwise time-consuming tasks. While various approaches for binary function similarity comparison have been proposed, in an experiment of previous work to fairly evaluate existing methods, a method combining graph neural network (GNN) and bag-of-words (BoW) exhibited the highest performance. In this method, each basic block (BB) in a function is embedded into a vector by BoW. As a result, the function vector is derived from sparse vectors. In this paper, we propose a method combining a GNN with fastText, instead of BoW. Furthermore, in order to optimize machine learning models for calculating binary function similarity, we apply early stopping based on mean reciprocal rank (MRR) to our machine learning training. Our method outperformed the previous method combining GNN and BoW by up to 2% in AUC, up to 9% in Recall@1 and up to 7% in MRR10 in a certain case. Additionally, through a function search case study in malware analysis, our method has been found to be applicable for finding distinctive functions present in LockBit Ransomware.
引用
收藏
页码:309 / 329
页数:21
相关论文
共 50 条
  • [1] How Machine Learning Is Solving the Binary Function Similarity Problem
    Marcelli, Andrea
    Graziano, Mariano
    Ugarte-Pedrero, Xabier
    Fratantonio, Yanick
    Mansouri, Mohamad
    Balzarotti, Davide
    PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 2099 - 2116
  • [2] Binary Similarity Detection Using Machine Learning
    Shalev, Noam
    Partush, Nimrod
    PLAS'18: PROCEEDINGS OF THE 13TH WORKSHOP ON PROGRAMMING LANGUAGES AND ANALYSIS FOR SECURITY, 2018, : 42 - 47
  • [3] Calculating Web Service Similarity using Ontology Learning with Machine Learning
    Rupasingha, Rupasingha A. H. M.
    Paik, Incheon
    Kumara, Banage T. G. S.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 201 - 208
  • [4] An Approach to Hyperparameter Optimization for the Objective Function in Machine Learning
    Kim, Yonghoon
    Chung, Mokdong
    ELECTRONICS, 2019, 8 (11)
  • [5] A Machine Learning Approach to Policy Optimization in System Dynamics Models
    Chen, Yao-Tsung
    Tu, Yi-Ming
    Jeng, Bingchiang
    SYSTEMS RESEARCH AND BEHAVIORAL SCIENCE, 2011, 28 (04) : 369 - 390
  • [6] AN OPTIMIZATION APPROACH TO CALCULATING SAMPLE SIZES WITH BINARY RESPONSES
    Maroufy, Vahed
    Marriott, Paul
    Pezeshk, Hamid
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2014, 24 (04) : 715 - 731
  • [7] A simple function embedding approach for binary similarity detection
    Li, Weilong
    Jin, Shuyuan
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 570 - 577
  • [8] FORTRAN PROGRAM FOR CALCULATING BINARY SIMILARITY COEFFICIENTS
    MILLENDORF, SA
    SRIVASTAVA, GS
    DYMAN, TA
    BROWER, JC
    COMPUTERS & GEOSCIENCES, 1978, 4 (03) : 307 - 311
  • [9] Binary Classification of Proteins by a Machine Learning Approach
    Perri, Damiano
    Simonetti, Marco
    Lombardi, Andrea
    Faginas-Lago, Noelia
    Gervasi, Osvaldo
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT VII, 2020, 12255 : 549 - 558
  • [10] Improved machine learning models with a similarity-based approach for remaining useful life prediction
    Isbilen, F.
    Bektas, O.
    Avsar, R.
    Konar, M.
    Aeronautical Journal, 2024,