ConFunc: Enhanced Binary Function-Level Representation through Contrastive Learning

被引:0
|
作者
Li, Longfei [1 ]
Yin, Xiaokang [2 ]
Li, Xiao [2 ]
Zhu, Xiaoya [2 ]
Liu, Shengli [2 ]
机构
[1] Zhengzhou Univ, Zhengzhou, Peoples R China
[2] Informat Engn Univ, Zhengzhou, Peoples R China
关键词
binary code similarity detection; machine learning; contrastive learning; function embeddings;
D O I
10.1109/TrustCom60117.2023.00169
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary code similarity detection (BCSD) has numerous applications, including malware detection, vulnerability search, plagiarism detection, and patch identification. Recent studies have demonstrated that with the rapid progress of machine learning (ML) techniques, various BCSD approaches based on machine learning have exhibited stronger performance than traditional methods. However, current ML-based BCSD approaches tend to ignore the issue of training samples, and most ML-based BCSD approaches are based on supervised learning, which is suffered from the labelling difficulties. To mitigate these issues, we propose ConFunc: a function-level binary code similarity detection framework based on contrastive learning. Performance evaluation shows that ConFunc enhances the Mean Reciprocal Rank (MRR) and Recall rates (Recall@1) of baseline models by fully harnessing the potential of the data. Additionally, ConFunc demonstrates stronger performance in scenarios with scarce data, achieving the baseline model's performance on the entire dataset using only 10% of the complete dataset. In real-world patch identification and vulnerability search tasks, ConFunc consistently outperforms other baseline models in MRR and Recall@10.
引用
收藏
页码:1241 / 1248
页数:8
相关论文
共 50 条
  • [41] Binary-level Directed Symbolic Execution Through Pattern Learning
    Zhang, Zhijie
    Chen, Liwei
    Wei, Haolai
    Dong, Guochao
    Zhang, Yuantong
    Nie, Xiaofan
    Shi, Gang
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 50 - 57
  • [42] HAPiCLR: heuristic attention pixel-level contrastive loss representation learning for self-supervised pretraining
    Tran, Van Nhiem
    Liu, Shen-Hsuan
    Huang, Chi-En
    Aslam, Muhammad Saqlain
    Yang, Kai-Lin
    Li, Yung-Hui
    Wang, Jia-Ching
    VISUAL COMPUTER, 2024, 40 (11): : 7945 - 7960
  • [43] Improving Multimodal Sentiment Analysis: Supervised Angular Margin-based Contrastive Learning for Enhanced Fusion Representation
    Nguyen, Cong-Duy
    Nguyen, Thong
    Vu, Duc Anh
    Tuan, Luu Anh
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14714 - 14724
  • [44] Improved senescent cell segmentation on bright-field microscopy images exploiting representation level contrastive learning
    Celebi, Fatma
    Boyvat, Dudu
    Ayaz-Guner, Serife
    Tasdemir, Kasim
    Icoz, Kutay
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
  • [45] Multi-level Connection Enhanced Representation Learning for Script Event Prediction
    Wang, Lihong
    Yue, Juwei
    Guo, Shu
    Sheng, Jiawei
    Mao, Qianren
    Chen, Zhenyu
    Zhong, Shenghai
    Li, Chen
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 3524 - 3533
  • [46] Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training
    Dave, Vedant
    Lygerakis, Fotios
    Rueckert, Elmar
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 8013 - 8020
  • [47] Learning high-level independent components of images through a spectral representation
    Lindgren, JT
    Hyvärinen, A
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 72 - 75
  • [48] TOCOL: improving contextual representation of pre-trained language models via token-level contrastive learning
    Wang, Keheng
    Yin, Chuantao
    Li, Rumei
    Wang, Sirui
    Xian, Yunsen
    Rong, Wenge
    Xiong, Zhang
    MACHINE LEARNING, 2024, 113 (07) : 3999 - 4012
  • [49] GraphMoCo: A graph momentum contrast model for large-scale binary function representation learning
    Sun, Runjin
    Guo, Shize
    Guo, Jinhong
    Wei, Li
    Zhang, Xingyu
    Xi, Guo
    Pan, Zhisong
    NEUROCOMPUTING, 2024, 575
  • [50] Shape Representation and Classification Through Height Functions and Local Binary Pattern - A Decision Level Fusion Approach
    Shekar, B. H.
    Pilar, Bharathi
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2481 - 2488