SCodeSearcher: soft contrastive learning for code search

被引:0
|
作者
Li, Jia [1 ]
Fang, Zheng [1 ]
Shi, Xianjie [1 ]
Jin, Zhi [1 ]
Liu, Fang [2 ]
Li, Jia [1 ]
Zhao, Yunfei [1 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Key Lab High Confidence Software Technol, MoE, Beijing, Peoples R China
[2] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Code search; Contrastive learning; Deep neural network;
D O I
10.1007/s10664-024-10603-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search has been a critical software development activity in facilitating the efficiency of developers. It retrieves programs to satisfy the user intent from a codebase. Recently, many researchers have applied contrastive learning to learn the semantic relationships between queries and code snippets, resulting in impressive performance in code search. Though achieving improvements, these models ignore the following challenging scenarios in code search. First, a good code search tool should be able to retrieve all code snippets from a candidate pool that meet the given query and are implemented in diverse manners, thus the retrieved programs can satisfy different programming styles of developers. Second, in the open-source community, some programs have similar implementations but provide different functions. Code search engines need to distinguish desired programs from these confusing code snippets that have similar implementations but can not meet the query. To address these limitations, we propose a soft contrastive learning method SCodeSearcher for code search, which highlights challenging examples by arranging high weights to them based on their challenging degrees in the contrastive learning objective. We conduct extensive experiments on five representative code search datasets including code retrieval and code question answering tasks. The experimental results show that SCodeSearcher only trained on a much smaller (less than one-tenth) corpus can achieve comparable performances to existing methods optimized on the large-scale dataset, significantly saving computing resources and training time.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Contrastive Learning for User Sequence Representation in Personalized Product Search
    Dai, Shitong
    Liu, Jiongnan
    Dou, Zhicheng
    Wang, Haonan
    Liu, Lin
    Long, Bo
    Wen, Ji-Rong
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 380 - 389
  • [32] Consistent prototype contrastive learning for weakly supervised person search
    Lin, Huadong
    Yu, Xiaohan
    Zhang, Pengcheng
    Bai, Xiao
    Zheng, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 105
  • [33] TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation
    Xian, Zixiang
    Huang, Rubing
    Towey, Dave
    Fang, Chunrong
    Chen, Zhenyu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1600 - 1619
  • [34] Advanced code time complexity prediction approach using contrastive learning
    Park, Shinwoo
    Hahn, Joonghyuk
    Orwig, Elizabeth
    Ko, Sang-Ki
    Han, Yo-Sub
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 151
  • [35] When Deep Learning Met Code Search
    Cambronero, Jose
    Li, Hongyu
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 964 - 974
  • [36] Survey of Code Search Based on Deep Learning
    Xie, Yutao
    Lin, Jiayi
    Dong, Hande
    Zhang, Lei
    Wu, Zhonghai
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
  • [37] Detecting Source Code Vulnerabilities using High-Precision Code Representation and Bimodal Contrastive Learning
    Wang, Jie
    Xu, Mengru
    Chen, Hao
    2024 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS, NANA 2024, 2024, : 536 - 541
  • [38] CLAVE: A deep learning model for source code authorship verification with contrastive learning and transformer encoders
    alvarez-Fidalgo, David
    Ortin, Francisco
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [39] CLAVE: A deep learning model for source code authorship verification with contrastive learning and transformer encoders
    Álvarez-Fidalgo, David
    Ortin, Francisco
    Information Processing and Management, 3
  • [40] SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
    Wang, Hao
    Dou, Yong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 419 - 431