BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

被引:1
|
作者
Jiang, Shuai [1 ]
Fu, Cai [1 ]
He, Shuai [1 ]
Lv, Jianqiang [1 ]
Han, Lansheng [1 ]
Hu, Hong [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
关键词
Feature extraction; Contrastive learning; Vectors; Source coding; Software; Semantics; Training; Diversity sensitive; binary analysis; similarity detection; attention mechanism; NETWORKS;
D O I
10.1109/TSE.2024.3411072
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated real-world application scenarios. In this paper, we propose BinCola, a novel Transformer-based dual diversity-sensitive contrastive learning framework that comprehensively considers the diversity of compiler options and candidate functions in the real-world application scenarios and employs the attention mechanism to fuse multi-granularity function features for enhancing generality and scalability. BinCola simultaneously compares multiple candidate functions across various compilation option scenarios to learn the differences caused by distinct compiler options and different candidate functions. We evaluate BinCola's performance in a variety of ways, including binary similarity detection and real-world vulnerability search in multiple application scenarios. The results demonstrate that BinCola achieves superior performance compared to state-of-the-art (SOTA) methods, with improvements of 2.80%, 33.62%, 22.41%, and 34.25% in cross-architecture, cross-optimization level, cross-compiler, and cross-obfuscation scenarios, respectively.
引用
收藏
页码:2485 / 2497
页数:13
相关论文
共 50 条
  • [31] Evaluating Few-Shot and Contrastive Learning Methods for Code Clone Detection
    Khajezade, Mohamad
    Fard, Fatemeh Hendijani
    Shehata, Mohamed S.
    arXiv, 2022,
  • [32] Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection
    He, Haojie
    Lin, Xingwei
    Weng, Ziang
    Zhao, Ruijie
    Gana, Shuitao
    Chen, Libo
    Ji, Yuede
    Wang, Jiashui
    Xue, Zhi
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 1759 - 1776
  • [33] Investigating Graph Embedding Methods for Cross-Platform Binary Code Similarity Detection
    Cochard, Victor
    Pfammatter, Damian
    Duong, Chi Thang
    Humbert, Mathias
    2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 60 - 73
  • [34] Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection
    Yu, Zeping
    Cao, Rui
    Tang, Qiyi
    Nie, Sen
    Huang, Junzhou
    Wu, Shi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1145 - 1152
  • [35] Multi-semantic feature fusion attention network for binary code similarity detection
    Bangling Li
    Yuting Zhang
    Huaxi Peng
    Qiguang Fan
    Shen He
    Yan Zhang
    Songquan Shi
    Yang Zhang
    Ailiang Ma
    Scientific Reports, 13
  • [36] Multi-semantic feature fusion attention network for binary code similarity detection
    Li, Bangling
    Zhang, Yuting
    Peng, Huaxi
    Fan, Qiguang
    He, Shen
    Zhang, Yan
    Shi, Songquan
    Zhang, Yang
    Ma, Ailiang
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [37] CBSDI: Cross-Architecture Binary Code Similarity Detection based on Index Table
    Deng, Longmin
    Zhao, Dongdong
    Zhou, Junwei
    Xia, Zhe
    Xiang, Jianwen
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 527 - 536
  • [38] Cross-platform binary code similarity detection based on NMT and graph embedding
    Zhu, Xiaodong
    Jiang, Liehui
    Chen, Zeng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (04) : 4528 - 4551
  • [39] SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT
    Wang, Rongcun
    Xu, Senlei
    Tian, Yuan
    Ji, Xingyu
    Sun, Xiaobing
    Jiang, Shujuang
    COMPUTERS & SECURITY, 2024, 145
  • [40] FUSION: Measuring Binary Function Similarity with Code-Specific Embedding and Order-Sensitive GNN
    Gao, Hao
    Zhang, Tong
    Chen, Songqiang
    Wang, Lina
    Yu, Fajiang
    SYMMETRY-BASEL, 2022, 14 (12):