CORES: COde REpresentation Summarization for Code Search

被引：0

作者：

Zhang, Xu ^{[1
]}

Hu, Xiaoyu ^{[1
]}

Zhou, Deyu ^{[1
]}

机构：

[1] Southeast Univ, Sch Comp Sci & Engn, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 210096, Peoples R China

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2024年 / 70卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Codes; Semantics; Feature extraction; Vectors; Redundancy; Training; Software development management; Code search; code representation; summarization;

D O I：

10.1109/TCE.2024.3445139

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.

引用

页码：6095 / 6104

页数：10

共 50 条

[21] Code Summarization with Abstract Syntax Tree
Chen, Qiuyuan
Hu, Han
Liu, Zhaoyi
NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 652 - 660
[22] Recommendations for Datasets for Source Code Summarization
LeClair, Alex
McMillan, Collin
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3931 - 3937
[23] Pyramid Attention For Source Code Summarization
Chai, Lei
Li, Ming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[24] Impact of Evaluation Methodologies on Code Summarization
Nie, Pengyu
Zhang, Jiyang
Li, Junyi Jessy
Mooney, Raymond J.
Gligoric, Milos
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4936 - 4960
[25] Retrieval augmented code generation and summarization
University of California, Los Angeles, United States
不详
arXiv,
[26] Leveraging Comment Retrieval for Code Summarization
Hou, Shifu
Chen, Lingwei
Ju, Mingxuan
Ye, Yanfang
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 439 - 447
[27] Interpretation-based Code Summarization
Geng, Mingyang
Wang, Shangwen
Dong, Dezun
Wang, Haotian
Cao, Shaomeng
Zhang, Kechi
Jin, Zhi
2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 113 - 124
[28] Distilled GPT for source code summarization
Su, Chia-Yi
McMillan, Collin
AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
[29] Recommendations for Datasets for Source Code Summarization
LeClair, Alex
McMillan, Collin
arXiv, 2019,
[30] A Survey of Automatic Source Code Summarization
Zhang, Chunyan
Wang, Junchao
Zhou, Qinglei
Xu, Ting
Tang, Ke
Gui, Hairen
Liu, Fudong
SYMMETRY-BASEL, 2022, 14 (03):

← 1 2 3 4 5 →