CORES: COde REpresentation Summarization for Code Search

被引：0

作者：

Zhang, Xu ^{[1
]}

Hu, Xiaoyu ^{[1
]}

Zhou, Deyu ^{[1
]}

机构：

[1] Southeast Univ, Sch Comp Sci & Engn, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 210096, Peoples R China

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2024年 / 70卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Codes; Semantics; Feature extraction; Vectors; Redundancy; Training; Software development management; Code search; code representation; summarization;

D O I：

10.1109/TCE.2024.3445139

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.

引用

页码：6095 / 6104

页数：10

共 50 条

[31] Code semantic enrichment for deep code search
Deng, Zhongyang
Xu, Ling
Liu, Chao
Huangfu, Luwen
Yan, Meng
JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 207
[32] Boosting Code Search with Structural Code Annotation
Kong, Xianglong
Chen, Hongyu
Yu, Ming
Zhang, Lixiang
ELECTRONICS, 2022, 11 (19)
[33] FACoY - A Code-to-Code Search Engine
Kim, Kisub
Kim, Dongsun
Bissyande, Tegawende F.
Choi, Eunjong
Li, Li
Klein, Jacques
Le Traon, Yves
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 946 - 957
[34] Retrieval on Source Code: A Neural Code Search
Sachdev, Saksham
Li, Hongyu
Luan, Sifei
Kim, Seohyun
Sen, Koushik
Chandra, Satish
MAPL'18: PROCEEDINGS OF THE 2ND ACM SIGPLAN INTERNATIONAL WORKSHOP ON MACHINE LEARNING AND PROGRAMMING LANGUAGES, 2018, : 31 - 41
[35] Code Search: A Survey of Techniques for Finding Code
Di Grazia, Luca
Pradel, Michael
ACM COMPUTING SURVEYS, 2023, 55 (11)
[36] ExCS: accelerating code search with code expansion
Huang, Siwei
Cai, Bo
Yu, Yaoxiang
Luo, Jian
SCIENTIFIC REPORTS, 2024, 14 (01):
[37] Code Parallelization through Sequential Code Search
Cai, Bowen
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 695 - 697
[38] CogCol: Code Graph-Based Contrastive Learning Model for Code Summarization
Shi, Yucen
Yin, Ying
Yu, Mingqian
Chu, Liangyu
ELECTRONICS, 2024, 13 (10)
[39] Universal Representation for Code
Liu, Linfeng
Nguyen, Hoan
Karypis, George
Sengamedu, Srinivasan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT III, 2021, 12714 : 16 - 28
[40] Universal representation for code
Liu, Linfeng
Nguyen, Hoan
Karypis, George
Sengamedu, Srinivasan
arXiv, 2021,

← 1 2 3 4 5 →