Multi-grained contextual code representation learning for commit message generation

被引:3
|
作者
Wang, Chuangwei [1 ]
Zhang, Li [1 ]
Zhang, Xiaofang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Code change; Code representation learning; Commit message generation; Pre-training; COMPLETION;
D O I
10.1016/j.infsof.2023.107393
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose multi-grained contextual code representation learning for commit message generation (COMU). We extract multi-grained information from the changed code at the line and AST levels (i.e., Code_Diff and AST_Diff). In Code_Diff, we construct global contextual semantic information about the changed code, and mark whether a line of code has changed with three different tokens. In AST_Diff, we extract the code structure from source code changes and combine the extracted structure with four types of editing operations to explicitly focus on the detailed information of the changed part. In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Video anomaly detection via pseudo-anomaly generation and multi-grained feature learning
    Deng, Haigang
    Yang, Qingyang
    Li, Chengwei
    Liang, Hanzhong
    Wang, Chuanxu
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [22] ESGen: Commit Message Generation Based on Edit Sequence of Code Change
    Chen, Xiangping
    Li, Yangzi
    Tang, Zhicao
    Huang, Yuan
    Zhou, Haojie
    Tang, Mingdong
    Zheng, Zibin
    PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 112 - 124
  • [23] Improving radiology report generation with multi-grained abnormality prediction
    Jin, Yuda
    Chen, Weidong
    Tian, Yuanhe
    Song, Yan
    Yan, Chenggang
    NEUROCOMPUTING, 2024, 600
  • [24] Learning the Multi-Grained Process Attributes for Industrial Fault Classification
    Zhou, Han
    Li, Yanxia
    Zhao, Dandan
    Yin, Hongpeng
    Chai, Yi
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2098 - 2103
  • [25] Multi-Grained Deep Cascade Learning for ECG Biometric Recognition
    Wang, Sujuan
    Zhang, Ruili
    TRAITEMENT DU SIGNAL, 2023, 40 (02) : 683 - 691
  • [26] Multi-Grained Deep Feature Learning for Robust Pedestrian Detection
    Lin, Chunze
    Lu, Jiwen
    Zhou, Jie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (12) : 3608 - 3621
  • [27] Clinical Trial Retrieval via Multi-grained Similarity Learning
    Luo, Junyu
    Qian, Cheng
    Glass, Lucas
    Ma, Fenglong
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2950 - 2954
  • [28] Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning
    Liu, Aohan
    Guo, Yuchen
    Yong, Jun-Hai
    Xu, Feng
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (07) : 2657 - 2669
  • [29] Multi-Grained Attention Representation With ALBERT for Aspect-Level Sentiment Classification
    Chen, Yuezhe
    Kong, Lingyun
    Wang, Yang
    Kong, Dezhi
    IEEE ACCESS, 2021, 9 : 106703 - 106713
  • [30] Commit Message Generation from Code Differences using Hidden Markov Models
    Awad, Ahmed
    Nagaty, Khaled
    PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 96 - 99