Approach for the Optimization of Machine Learning Models for Calculating Binary Function Similarity

被引:0
|
作者
Horimoto, Suguru [1 ,2 ]
Lucas, Keane [3 ]
Bauer, Lujo [3 ]
机构
[1] Natl Police Agcy, Tokyo, Japan
[2] Carnegie Mellon Univ, CyLab Secur & Privacy Inst, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Malware analysis; Graph learning; Similarity;
D O I
10.1007/978-3-031-64171-8_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary function similarity comparison is essential in a variety of security fields, such as software vulnerability detection and malware analysis, because it enables engineers to accelerate otherwise time-consuming tasks. While various approaches for binary function similarity comparison have been proposed, in an experiment of previous work to fairly evaluate existing methods, a method combining graph neural network (GNN) and bag-of-words (BoW) exhibited the highest performance. In this method, each basic block (BB) in a function is embedded into a vector by BoW. As a result, the function vector is derived from sparse vectors. In this paper, we propose a method combining a GNN with fastText, instead of BoW. Furthermore, in order to optimize machine learning models for calculating binary function similarity, we apply early stopping based on mean reciprocal rank (MRR) to our machine learning training. Our method outperformed the previous method combining GNN and BoW by up to 2% in AUC, up to 9% in Recall@1 and up to 7% in MRR10 in a certain case. Additionally, through a function search case study in malware analysis, our method has been found to be applicable for finding distinctive functions present in LockBit Ransomware.
引用
收藏
页码:309 / 329
页数:21
相关论文
共 50 条
  • [31] An optimization algorithm guided by a machine learning approach
    Erik Cuevas
    Jorge Galvez
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2963 - 2991
  • [32] Machine learning models: Combining evidence of similarity for XML schema matching
    Hong-Minh, Tran
    Smith, Dan
    KNOWLEDGE DISCOVERY FROM XML DOCUMENTS, PROCEEDINGS, 2006, 3915 : 43 - 53
  • [33] Generalization and similarity in exemplar models of categorization: Insights from machine learning
    Frank Jäkel
    Bernhard Schölkopf
    Felix A. Wichmann
    Psychonomic Bulletin & Review, 2008, 15 : 256 - 271
  • [34] Generalization and similarity in exemplar models of categorization:: Insights from machine learning
    Jaekel, Frank
    Schoelkopf, Bernhard
    Wichmann, Felix A.
    PSYCHONOMIC BULLETIN & REVIEW, 2008, 15 (02) : 256 - 271
  • [35] Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
    Zuo, Fei
    Li, Xiaopeng
    Young, Patrick
    Luo, Lannan
    Zeng, Qiang
    Zhang, Zhexin
    26TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2019), 2019,
  • [36] Evaluation of coupled machine learning models for drilling optimization
    Hegde, Chiranth
    Gray, Ken
    JOURNAL OF NATURAL GAS SCIENCE AND ENGINEERING, 2018, 56 : 397 - 407
  • [37] Machine learning models to support reservoir production optimization
    Teixeira, Alex F.
    Secchi, Argimiro R.
    IFAC PAPERSONLINE, 2019, 52 (01): : 498 - 501
  • [38] Cross Architecture Function Similarity Detection with Binary Lifting and Neural Metric Learning
    Tian, Zhenzhou
    Li, Chen
    Qiu, Sihao
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 27 - 34
  • [39] Similarity measure learning for image retrieval using binary component discriminating function
    Ye, HJ
    Xu, GY
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS, 2003, : 717 - 720
  • [40] A New Approach for Calculating Similarity of Categorical Data
    Jin, Cheng Hao
    Li, Xun
    Lee, Yang Koo
    Pok, Gouchol
    Ryu, Keun Ho
    CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, 2011, 206 : 584 - +