CORES: COde REpresentation Summarization for Code Search

被引:0
|
作者
Zhang, Xu [1 ]
Hu, Xiaoyu [1 ]
Zhou, Deyu [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Codes; Semantics; Feature extraction; Vectors; Redundancy; Training; Software development management; Code search; code representation; summarization;
D O I
10.1109/TCE.2024.3445139
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.
引用
收藏
页码:6095 / 6104
页数:10
相关论文
共 50 条
  • [21] Code Summarization with Abstract Syntax Tree
    Chen, Qiuyuan
    Hu, Han
    Liu, Zhaoyi
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 652 - 660
  • [22] Recommendations for Datasets for Source Code Summarization
    LeClair, Alex
    McMillan, Collin
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3931 - 3937
  • [23] Pyramid Attention For Source Code Summarization
    Chai, Lei
    Li, Ming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [24] Impact of Evaluation Methodologies on Code Summarization
    Nie, Pengyu
    Zhang, Jiyang
    Li, Junyi Jessy
    Mooney, Raymond J.
    Gligoric, Milos
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4936 - 4960
  • [25] Retrieval augmented code generation and summarization
    University of California, Los Angeles, United States
    不详
    arXiv,
  • [26] Leveraging Comment Retrieval for Code Summarization
    Hou, Shifu
    Chen, Lingwei
    Ju, Mingxuan
    Ye, Yanfang
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 439 - 447
  • [27] Interpretation-based Code Summarization
    Geng, Mingyang
    Wang, Shangwen
    Dong, Dezun
    Wang, Haotian
    Cao, Shaomeng
    Zhang, Kechi
    Jin, Zhi
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 113 - 124
  • [28] Distilled GPT for source code summarization
    Su, Chia-Yi
    McMillan, Collin
    AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
  • [29] Recommendations for Datasets for Source Code Summarization
    LeClair, Alex
    McMillan, Collin
    arXiv, 2019,
  • [30] A Survey of Automatic Source Code Summarization
    Zhang, Chunyan
    Wang, Junchao
    Zhou, Qinglei
    Xu, Ting
    Tang, Ke
    Gui, Hairen
    Liu, Fudong
    SYMMETRY-BASEL, 2022, 14 (03):