CoCoSoDa: Effective Contrastive Learning for Code Search

被引:16
|
作者
Shi, Ensheng [1 ]
Wang, Yanlin [2 ]
Gu, Wenchao [3 ]
Du, Lun [4 ]
Zhang, Hongyu [5 ]
Han, Shi [4 ]
Zhang, Dongmei [4 ]
Sun, Hongbin [1 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Peoples R China
[2] Sun Yat Sen Univ, Schoo Software Engn, Guangzhou, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Beijing, Peoples R China
[5] Chongqing Univ, Chongqing, Peoples R China
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年
基金
国家重点研发计划;
关键词
code search; contrastive learning; soft data augmentation; momentum mechanism; COMPLETION;
D O I
10.1109/ICSE48619.2023.00185
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples. Specifically, soft data augmentation is to dynamically masking or replacing some tokens with their types for input sequences to generate positive samples. Momentum mechanism is used to generate large and consistent representations of negative samples in a mini-batch through maintaining a queue and a momentum encoder. In addition, multimodal contrastive learning is used to pull together representations of code-query pairs and push apart the unpaired code snippets and queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. Experimental results show that: (1) CoCoSoDa outperforms 18 baselines and especially exceeds CodeBERT, GraphCodeBERT, and UniXcoder by 13.3%, 10.5%, and 5.9% on average MRR scores, respectively. (2) The ablation studies show the effectiveness of each component of our approach. (3) We adapt our techniques to several different pre-trained models such as RoBERTa, CodeBERT, and GraphCodeBERT and observe a significant boost in their performance in code search. (4) Our model performs robustly under different hyper-parameters. Furthermore, we perform qualitative and quantitative analyses to explore reasons behind the good performance of our model.
引用
收藏
页码:2198 / 2210
页数:13
相关论文
共 50 条
  • [31] Advanced code time complexity prediction approach using contrastive learning
    Park, Shinwoo
    Hahn, Joonghyuk
    Orwig, Elizabeth
    Ko, Sang-Ki
    Han, Yo-Sub
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 151
  • [32] When Deep Learning Met Code Search
    Cambronero, Jose
    Li, Hongyu
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 964 - 974
  • [33] Survey of Code Search Based on Deep Learning
    Xie, Yutao
    Lin, Jiayi
    Dong, Hande
    Zhang, Lei
    Wu, Zhonghai
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
  • [34] Towards Effective and Robust Graph Contrastive Learning With Graph Autoencoding
    Li, Wen-Zhi
    Wang, Chang-Dong
    Lai, Jian-Huang
    Yu, Philip S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (02) : 868 - 881
  • [35] An effective negative sampling approach for contrastive learning of sentence embedding
    Tan, Qitao
    Song, Xiaoying
    Ye, Guanghui
    Wu, Chuan
    MACHINE LEARNING, 2023, 112 (12) : 4837 - 4861
  • [36] An effective negative sampling approach for contrastive learning of sentence embedding
    Qitao Tan
    Xiaoying Song
    Guanghui Ye
    Chuan Wu
    Machine Learning, 2023, 112 : 4837 - 4861
  • [37] Toward Effective Image Manipulation Detection With Proposal Contrastive Learning
    Zeng, Yuyuan
    Zhao, Bowen
    Qiu, Shanzhao
    Dai, Tao
    Xia, Shu-Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 33 (09) : 4703 - 4714
  • [38] Improving Cancer Metastasis Detection via Effective Contrastive Learning
    Zheng, Haixia
    Zhou, Yu
    Huang, Xin
    MATHEMATICS, 2022, 10 (14)
  • [39] Detecting Source Code Vulnerabilities using High-Precision Code Representation and Bimodal Contrastive Learning
    Wang, Jie
    Xu, Mengru
    Chen, Hao
    2024 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS, NANA 2024, 2024, : 536 - 541
  • [40] CLAVE: A deep learning model for source code authorship verification with contrastive learning and transformer encoders
    alvarez-Fidalgo, David
    Ortin, Francisco
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)