SCodeSearcher: soft contrastive learning for code search

被引:0
|
作者
Li, Jia [1 ]
Fang, Zheng [1 ]
Shi, Xianjie [1 ]
Jin, Zhi [1 ]
Liu, Fang [2 ]
Li, Jia [1 ]
Zhao, Yunfei [1 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Key Lab High Confidence Software Technol, MoE, Beijing, Peoples R China
[2] Beihang Univ, State Key Lab Complex & Crit Software Environm, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Code search; Contrastive learning; Deep neural network;
D O I
10.1007/s10664-024-10603-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search has been a critical software development activity in facilitating the efficiency of developers. It retrieves programs to satisfy the user intent from a codebase. Recently, many researchers have applied contrastive learning to learn the semantic relationships between queries and code snippets, resulting in impressive performance in code search. Though achieving improvements, these models ignore the following challenging scenarios in code search. First, a good code search tool should be able to retrieve all code snippets from a candidate pool that meet the given query and are implemented in diverse manners, thus the retrieved programs can satisfy different programming styles of developers. Second, in the open-source community, some programs have similar implementations but provide different functions. Code search engines need to distinguish desired programs from these confusing code snippets that have similar implementations but can not meet the query. To address these limitations, we propose a soft contrastive learning method SCodeSearcher for code search, which highlights challenging examples by arranging high weights to them based on their challenging degrees in the contrastive learning objective. We conduct extensive experiments on five representative code search datasets including code retrieval and code question answering tasks. The experimental results show that SCodeSearcher only trained on a much smaller (less than one-tenth) corpus can achieve comparable performances to existing methods optimized on the large-scale dataset, significantly saving computing resources and training time.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] CoCoSoDa: Effective Contrastive Learning for Code Search
    Shi, Ensheng
    Wang, Yanlin
    Gu, Wenchao
    Du, Lun
    Zhang, Hongyu
    Han, Shi
    Zhang, Dongmei
    Sun, Hongbin
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2198 - 2210
  • [2] Cross-Modal Contrastive Learning for Code Search
    Shi, Zejian
    Xiong, Yun
    Zhang, Xiaolong
    Zhang, Yao
    Li, Shanshan
    Zhu, Yangyong
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 94 - 105
  • [3] Improving Code Search with Multi-Modal Momentum Contrastive Learning
    Shi, Zejian
    Xiong, Yun
    Zhang, Yao
    Jiang, Zhijie
    Zhao, Jinjing
    Wang, Lei
    Li, Shanshan
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 280 - 291
  • [4] Contrastive Learning with Keyword-based Data Augmentation for Code Search and Code Question Answering
    Park, Shinwoo
    Kim, Youngwook
    Han, Yo-Sub
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3609 - 3619
  • [5] Contrastive Code Representation Learning
    Jain, Paras
    Jain, Ajay
    Zhang, Tianjun
    Abbeel, Pieter
    Gonzalez, Joseph E.
    Stoica, Ion
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5954 - 5971
  • [6] Effective Hard Negative Mining for Contrastive Learning-Based Code Search
    Fan, Ye
    Li, Chuanyi
    Ge, Jidong
    Huang, Liguo
    Luo, Bin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
  • [7] Adaptive Soft Contrastive Learning
    Feng, Chen
    Patras, Ioannis
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2721 - 2727
  • [8] FMCS: Improving Code Search by Multi-Modal Representation Fusion and Momentum Contrastive Learning
    Liu, Wenjie
    Chen, Gong
    Xie, Xiaoyuan
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 632 - 638
  • [9] SOFT CONTRASTIVE LEARNING FOR TIME SERIES
    Lee, Seunghan
    Park, Taeyoung
    Lee, Kibok
    12th International Conference on Learning Representations, ICLR 2024, 2024,
  • [10] Soft Contrastive Learning for Visual Localization
    Thoma, Janine
    Paudel, Danda Pani
    Van Gool, Luc
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33