CORES: COde REpresentation Summarization for Code Search

被引:0
|
作者
Zhang, Xu [1 ]
Hu, Xiaoyu [1 ]
Zhou, Deyu [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Key Lab New Generat Artificial Intelligence Techno, Minist Educ, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
Codes; Semantics; Feature extraction; Vectors; Redundancy; Training; Software development management; Code search; code representation; summarization;
D O I
10.1109/TCE.2024.3445139
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the growth of the consumer electronics market, the software development industry is facing new opportunities and an increased focus on code retrieval techniques to improve efficiency and reduce costs. Code search aims to retrieve and reuse code from extensive repositories based on a search query with specific requirements. Recently, pre-trained model-based approaches have become popular because of grasping semantic representations of code snippets and search queries accurately. However, such approaches ignore the inconsistency between code and query statements due to the redundant tokens, such as definitions and punctuation marks in the code snippets, which hinder the matching accuracy. To tackle such disadvantage, in this paper, two strategies are proposed based on explicit or implicit code representation summarization. By summarizing the code representation, the redundancy in the code is removed and the inconsistency between code and query statements is alleviated. For the explicit code representation summarization-based strategy, different views of contextual information are obtained and summarized based on different scales of pyramidal dilated convolution. As to the implicit code representation summarization-based strategy, covariance is directly applied to constrain the code representation to ensure de-redundancy. Experimental results on six benchmark datasets show both strategies outperform the current State-Of-The-Art model CORES by 1.2% on average MRR scores.
引用
收藏
页码:6095 / 6104
页数:10
相关论文
共 50 条
  • [41] Leveraging meta-data of code for adapting prompt tuning for code summarization
    Jiang, Zhihua
    Wang, Di
    Rao, Dongning
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [42] Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization
    Vijayaraghavan, Prashanth
    Nitsure, Apoorva
    Mackin, Charles
    Shi, Luyao
    Ambrogio, Stefano
    Haran, Arvind
    Paruthi, Viresh
    Elzein, Ali
    Coops, Dan
    Beymer, David
    Baldwin, Tyler
    Degan, Ehsan
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [43] Esale: Enhancing Code-Summary Alignment Learning for Source Code Summarization
    Fang, Chunrong
    Sun, Weisong
    Chen, Yuchen
    Chen, Xiao
    Wei, Zhao
    Zhang, Quanjun
    You, Yudu
    Luo, Bin
    Liu, Yang
    Chen, Zhenyu
    IEEE Transactions on Software Engineering, 2024, 50 (08) : 2077 - 2095
  • [44] Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning
    Ye, Wei
    Xie, Rui
    Zhang, Jinglei
    Hu, Tianxiang
    Wang, Xiaoyin
    Zhang, Shikun
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2309 - 2319
  • [45] Neural Code Comprehension: A Learnable Representation of Code Semantics
    Ben-Nun, Tal
    Jakobovits, Alice Shoshana
    Hoefler, Torsten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] Code-to-Code Search Based on Deep Neural Network and Code Mutation
    Fujiwara, Yuji
    Yoshida, Norihiro
    Choi, Eunjong
    Inoue, Katsuro
    2019 IEEE 13TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC '19), 2019, : 1 - 7
  • [47] Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code
    Bettenburg, Nicolas
    Thomas, Stephen W.
    Hassan, Ahmed E.
    2012 16TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2012, : 319 - 328
  • [48] BLOCSUM: Block Scope-based Source Code Summarization via Shared Block Representation
    Choi, YunSeok
    Kim, Hyojun
    Lee, Jee-Hyong
    Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2023, : 11427 - 11441
  • [49] Deep Code Search
    Gu, Xiaodong
    Zhang, Hongyu
    Kim, Sunghun
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 933 - 944
  • [50] SNIPR: Complementing Code Search with Code Retargeting Capabilities
    Sanchez, Huascar A.
    PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 1423 - 1426