Approach for the Optimization of Machine Learning Models for Calculating Binary Function Similarity

被引:0
|
作者
Horimoto, Suguru [1 ,2 ]
Lucas, Keane [3 ]
Bauer, Lujo [3 ]
机构
[1] Natl Police Agcy, Tokyo, Japan
[2] Carnegie Mellon Univ, CyLab Secur & Privacy Inst, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
Malware analysis; Graph learning; Similarity;
D O I
10.1007/978-3-031-64171-8_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary function similarity comparison is essential in a variety of security fields, such as software vulnerability detection and malware analysis, because it enables engineers to accelerate otherwise time-consuming tasks. While various approaches for binary function similarity comparison have been proposed, in an experiment of previous work to fairly evaluate existing methods, a method combining graph neural network (GNN) and bag-of-words (BoW) exhibited the highest performance. In this method, each basic block (BB) in a function is embedded into a vector by BoW. As a result, the function vector is derived from sparse vectors. In this paper, we propose a method combining a GNN with fastText, instead of BoW. Furthermore, in order to optimize machine learning models for calculating binary function similarity, we apply early stopping based on mean reciprocal rank (MRR) to our machine learning training. Our method outperformed the previous method combining GNN and BoW by up to 2% in AUC, up to 9% in Recall@1 and up to 7% in MRR10 in a certain case. Additionally, through a function search case study in malware analysis, our method has been found to be applicable for finding distinctive functions present in LockBit Ransomware.
引用
收藏
页码:309 / 329
页数:21
相关论文
共 50 条
  • [41] A Machine Learning Approach to Weighting Schemes in the Data Fusion of Similarity Coefficients
    Chen, Jenny
    Holliday, John
    Bradshaw, John
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (02) : 185 - 194
  • [42] A Machine Learning-Based Hybrid Approach to Subset Selection Using Binary Ant Colony Optimization Functions
    Selvi R.S.
    Bibi K.F.
    SN Computer Science, 4 (6)
  • [43] Binary Interaction Methods for High Dimensional Global Optimization and Machine Learning
    Benfenati, Alessandro
    Borghi, Giacomo
    Pareschi, Lorenzo
    APPLIED MATHEMATICS AND OPTIMIZATION, 2022, 86 (01):
  • [44] Binary Interaction Methods for High Dimensional Global Optimization and Machine Learning
    Alessandro Benfenati
    Giacomo Borghi
    Lorenzo Pareschi
    Applied Mathematics & Optimization, 2022, 86
  • [45] Models for calculating solubility in binary solvent systems
    BarzegarJalali, M
    JouybanGharamaleki, A
    INTERNATIONAL JOURNAL OF PHARMACEUTICS, 1996, 140 (02) : 237 - 246
  • [46] Development of machine learning models for the prediction of binary diffusion coefficients of gases
    Olumegbon, Ismail Adewale
    Alade, Ibrahim Olanrewaju
    Oyedeji, Mojeed Opeyemi
    Qahtan, Talal F.
    Bagudu, Aliyu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [47] The quest for the reliability of machine learning models in binary classification on tabular data
    Vitor Cirilo Araujo Santos
    Lucas Cardoso
    Ronnie Alves
    Scientific Reports, 13
  • [48] The quest for the reliability of machine learning models in binary classification on tabular data
    Santos, Vitor Cirilo Araujo
    Cardoso, Lucas
    Alves, Ronnie
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [49] Optimization in Machine Learning: a Distribution-Space Approach
    Cai, Yongqiang
    Li, Qianxiao
    Shen, Zuowei
    COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1217 - 1240
  • [50] Integrated Optimization of Semiconductor Manufacturing: A Machine Learning Approach
    Kupp, Nathan
    Makris, Yiorgos
    PROCEEDINGS INTERNATIONAL TEST CONFERENCE 2012, 2012,