CoCoSoDa: Effective Contrastive Learning for Code Search

被引:16
|
作者
Shi, Ensheng [1 ]
Wang, Yanlin [2 ]
Gu, Wenchao [3 ]
Du, Lun [4 ]
Zhang, Hongyu [5 ]
Han, Shi [4 ]
Zhang, Dongmei [4 ]
Sun, Hongbin [1 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Peoples R China
[2] Sun Yat Sen Univ, Schoo Software Engn, Guangzhou, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Beijing, Peoples R China
[5] Chongqing Univ, Chongqing, Peoples R China
来源
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年
基金
国家重点研发计划;
关键词
code search; contrastive learning; soft data augmentation; momentum mechanism; COMPLETION;
D O I
10.1109/ICSE48619.2023.00185
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples. Specifically, soft data augmentation is to dynamically masking or replacing some tokens with their types for input sequences to generate positive samples. Momentum mechanism is used to generate large and consistent representations of negative samples in a mini-batch through maintaining a queue and a momentum encoder. In addition, multimodal contrastive learning is used to pull together representations of code-query pairs and push apart the unpaired code snippets and queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. Experimental results show that: (1) CoCoSoDa outperforms 18 baselines and especially exceeds CodeBERT, GraphCodeBERT, and UniXcoder by 13.3%, 10.5%, and 5.9% on average MRR scores, respectively. (2) The ablation studies show the effectiveness of each component of our approach. (3) We adapt our techniques to several different pre-trained models such as RoBERTa, CodeBERT, and GraphCodeBERT and observe a significant boost in their performance in code search. (4) Our model performs robustly under different hyper-parameters. Furthermore, we perform qualitative and quantitative analyses to explore reasons behind the good performance of our model.
引用
收藏
页码:2198 / 2210
页数:13
相关论文
共 50 条
  • [41] CLAVE: A deep learning model for source code authorship verification with contrastive learning and transformer encoders
    Álvarez-Fidalgo, David
    Ortin, Francisco
    Information Processing and Management, 3
  • [42] Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition
    Wang, Xuechen
    Zhao, Shiwan
    Qin, Yong
    INTERSPEECH 2023, 2023, : 1913 - 1917
  • [43] PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling
    Zhou, Yujia
    Dou, Zhicheng
    Zhu, Yutao
    Wen, Ji-Rong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2749 - 2758
  • [44] ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning
    Liu, Shangqing
    Wu, Bozhi
    Xie, Xiaofei
    Meng, Guozhu
    Liu, Yang
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2476 - 2487
  • [45] Joint Embedding of Semantic and Statistical Features for Effective Code Search
    Kong, Xianglong
    Kong, Supeng
    Yu, Ming
    Du, Chengjie
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [46] Evaluating few-shot and contrastive learning methods for code clone detection
    Khajezade, Mohamad
    Fard, Fatemeh H.
    Shehata, Mohamed S.
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (06)
  • [47] Evaluating Few-Shot and Contrastive Learning Methods for Code Clone Detection
    Khajezade, Mohamad
    Fard, Fatemeh Hendijani
    Shehata, Mohamed S.
    arXiv, 2022,
  • [48] BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection
    Jiang, Shuai
    Fu, Cai
    He, Shuai
    Lv, Jianqiang
    Han, Lansheng
    Hu, Hong
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (10) : 2485 - 2497
  • [49] Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
    Fu, Haotian
    Tang, Hongyao
    Hao, Jianye
    Chen, Chen
    Feng, Xidong
    Li, Dong
    Liu, Wulong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7457 - 7465
  • [50] Heterogeneous data augmentation in graph contrastive learning for effective negative samples
    Ali, Adnan
    Li, Jinlong
    Chen, Huanhuan
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118