Multi-grained contextual code representation learning for commit message generation

被引:3
|
作者
Wang, Chuangwei [1 ]
Zhang, Li [1 ]
Zhang, Xiaofang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Code change; Code representation learning; Commit message generation; Pre-training; COMPLETION;
D O I
10.1016/j.infsof.2023.107393
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose multi-grained contextual code representation learning for commit message generation (COMU). We extract multi-grained information from the changed code at the line and AST levels (i.e., Code_Diff and AST_Diff). In Code_Diff, we construct global contextual semantic information about the changed code, and mark whether a line of code has changed with three different tokens. In AST_Diff, we extract the code structure from source code changes and combine the extracted structure with four types of editing operations to explicitly focus on the detailed information of the changed part. In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Multi-grained contrastive representation learning for label-efficient lesion segmentation and onset time classification of acute ischemic stroke
    Sun, Jiarui
    Liu, Yuhao
    Xi, Yan
    Coatrieux, Gouenou
    Coatrieux, Jean-Louis
    Ji, Xu
    Jiang, Liang
    Chen, Yang
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [42] Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
    Cheng, Xuxin
    Xu, Wanshi
    Zhu, Zhihong
    Li, Hongxiang
    Zou, Yuexian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 326 - 336
  • [43] Controllable Dialogue Generation With Disentangled Multi-Grained Style Specification and Attribute Consistency Reward
    Hu, Zhe
    Cao, Zhiwei
    Chan, Hou Pong
    Liu, Jiachen
    Xiao, Xinyan
    Su, Jinsong
    Wu, Hua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 188 - 199
  • [44] A Lightweight Multi-Grained Image-Text Retrieval Paradigm via Cascaded Representation Learning and Parameter-Free Feature Aggregation
    Lu, Chenyu
    Zhang, Nan
    Sun, Shiliang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13584 - 13595
  • [45] Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
    Liu, Nian
    Nan, Kepan
    Zhao, Wangbo
    Liu, Yuanwei
    Yao, Xiwen
    Khan, Salman
    Cholakkal, Hisham
    Anwer, Rao Muhammad
    Han, Junwei
    Khan, Fahad Shahbaz
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18816 - 18825
  • [46] Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5959 - 5972
  • [47] DIFFUSEMP: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation
    Bi, Guanqun
    Shen, Lei
    Cao, Yanan
    Chen, Meng
    Xie, Yuqiang
    Lin, Zheng
    He, Xiaodong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2812 - 2831
  • [48] Revealing Semantic Structures of Texts: Multi-grained Framework for Automatic Mind-map Generation
    Wei, Yang
    Guo, Honglei
    Wei, Jinmao
    Su, Zhong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5247 - 5254
  • [49] MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding
    Huang, Zhiqi
    Chen, Dongsheng
    Zhu, Zhihong
    Cheng, Xuxin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7936 - 7949
  • [50] Fault diagnosis of helicopter tail-drive system using a multi-grained hierarchical message graph convolutional networks
    Zhou, Junlin
    Long, Zhendong
    Yin, Aijun
    Alkahtani, Mohammed
    NONDESTRUCTIVE TESTING AND EVALUATION, 2025, 40 (03) : 1141 - 1160