CoCoSoDa: Effective Contrastive Learning for Code Search

被引:16
|
作者
Shi, Ensheng [1 ]
Wang, Yanlin [2 ]
Gu, Wenchao [3 ]
Du, Lun [4 ]
Zhang, Hongyu [5 ]
Han, Shi [4 ]
Zhang, Dongmei [4 ]
Sun, Hongbin [1 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Peoples R China
[2] Sun Yat Sen Univ, Schoo Software Engn, Guangzhou, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Beijing, Peoples R China
[5] Chongqing Univ, Chongqing, Peoples R China
基金
国家重点研发计划;
关键词
code search; contrastive learning; soft data augmentation; momentum mechanism; COMPLETION;
D O I
10.1109/ICSE48619.2023.00185
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples. Specifically, soft data augmentation is to dynamically masking or replacing some tokens with their types for input sequences to generate positive samples. Momentum mechanism is used to generate large and consistent representations of negative samples in a mini-batch through maintaining a queue and a momentum encoder. In addition, multimodal contrastive learning is used to pull together representations of code-query pairs and push apart the unpaired code snippets and queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. Experimental results show that: (1) CoCoSoDa outperforms 18 baselines and especially exceeds CodeBERT, GraphCodeBERT, and UniXcoder by 13.3%, 10.5%, and 5.9% on average MRR scores, respectively. (2) The ablation studies show the effectiveness of each component of our approach. (3) We adapt our techniques to several different pre-trained models such as RoBERTa, CodeBERT, and GraphCodeBERT and observe a significant boost in their performance in code search. (4) Our model performs robustly under different hyper-parameters. Furthermore, we perform qualitative and quantitative analyses to explore reasons behind the good performance of our model.
引用
收藏
页码:2198 / 2210
页数:13
相关论文
共 50 条
  • [21] On the Effectiveness of Transfer Learning for Code Search
    Salza, Pasquale
    Schwizer, Christoph
    Gu, Jian
    Gall, Harald C.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1804 - 1822
  • [22] Reinforcement Learning of Code Search Sessions
    Li, Wei
    Yan, Shuhan
    Shen, Beijun
    Chen, Yuting
    2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, : 458 - 465
  • [23] CONTRASTIVE ANALYSIS AS A GUIDE FOR EFFECTIVE TEACHING AND LEARNING OF ENGLISH
    Akpojisheri, Monday Ojevwe
    6TH INTERNATIONAL CONFERENCE OF EDUCATION, RESEARCH AND INNOVATION (ICERI 2013), 2013, : 4624 - 4628
  • [24] A Robust and Effective Text Detector Supervised by Contrastive Learning
    Wei, Ran
    Li, Yaoyi
    Li, Haiyan
    Tang, Ze
    Lu, Hongtao
    Cai, Nengbin
    Zhao, Xuejun
    IEEE ACCESS, 2021, 9 : 26431 - 26441
  • [25] Improving Bug Localization With Effective Contrastive Learning Representation
    Luo, Zhengmao
    Wang, Wenyao
    Cen, Caichun
    IEEE ACCESS, 2023, 11 : 32523 - 32533
  • [26] Effective sample pairs based contrastive learning for clustering
    Yin, Jun
    Wu, Haowei
    Sun, Shiliang
    INFORMATION FUSION, 2023, 99
  • [27] Contrastive Learning for User Sequence Representation in Personalized Product Search
    Dai, Shitong
    Liu, Jiongnan
    Dou, Zhicheng
    Wang, Haonan
    Liu, Lin
    Long, Bo
    Wen, Ji-Rong
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 380 - 389
  • [28] Consistent prototype contrastive learning for weakly supervised person search
    Lin, Huadong
    Yu, Xiaohan
    Zhang, Pengcheng
    Bai, Xiao
    Zheng, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 105
  • [29] Query Expansion via Wordnet for Effective Code Search
    Lu, Meili
    Sun, Xiaobing
    Wang, Shaowei
    Lo, David
    Duan, Yucong
    2015 22ND INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), 2015, : 545 - 549
  • [30] TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation
    Xian, Zixiang
    Huang, Rubing
    Towey, Dave
    Fang, Chunrong
    Chen, Zhenyu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1600 - 1619