CORES: COde REpresentation Summarization for Code Search

被引:0
|
作者
Zhang, Xu [1 ]
Hu, Xiaoyu [1 ]
Zhou, Deyu [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Codes; Semantics; Feature extraction; Vectors; Redundancy; Training; Software development management; Code search; code representation; summarization;
D O I
10.1109/TCE.2024.3445139
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.
引用
收藏
页码:6095 / 6104
页数:10
相关论文
共 50 条
  • [31] Code semantic enrichment for deep code search
    Deng, Zhongyang
    Xu, Ling
    Liu, Chao
    Huangfu, Luwen
    Yan, Meng
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 207
  • [32] Boosting Code Search with Structural Code Annotation
    Kong, Xianglong
    Chen, Hongyu
    Yu, Ming
    Zhang, Lixiang
    ELECTRONICS, 2022, 11 (19)
  • [33] FACoY - A Code-to-Code Search Engine
    Kim, Kisub
    Kim, Dongsun
    Bissyande, Tegawende F.
    Choi, Eunjong
    Li, Li
    Klein, Jacques
    Le Traon, Yves
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 946 - 957
  • [34] Retrieval on Source Code: A Neural Code Search
    Sachdev, Saksham
    Li, Hongyu
    Luan, Sifei
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    MAPL'18: PROCEEDINGS OF THE 2ND ACM SIGPLAN INTERNATIONAL WORKSHOP ON MACHINE LEARNING AND PROGRAMMING LANGUAGES, 2018, : 31 - 41
  • [35] Code Search: A Survey of Techniques for Finding Code
    Di Grazia, Luca
    Pradel, Michael
    ACM COMPUTING SURVEYS, 2023, 55 (11)
  • [36] ExCS: accelerating code search with code expansion
    Huang, Siwei
    Cai, Bo
    Yu, Yaoxiang
    Luo, Jian
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] Code Parallelization through Sequential Code Search
    Cai, Bowen
    2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 695 - 697
  • [38] CogCol: Code Graph-Based Contrastive Learning Model for Code Summarization
    Shi, Yucen
    Yin, Ying
    Yu, Mingqian
    Chu, Liangyu
    ELECTRONICS, 2024, 13 (10)
  • [39] Universal Representation for Code
    Liu, Linfeng
    Nguyen, Hoan
    Karypis, George
    Sengamedu, Srinivasan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT III, 2021, 12714 : 16 - 28
  • [40] Universal representation for code
    Liu, Linfeng
    Nguyen, Hoan
    Karypis, George
    Sengamedu, Srinivasan
    arXiv, 2021,