ESGen: Commit Message Generation Based on Edit Sequence of Code Change

被引:0
|
作者
Chen, Xiangping [1 ]
Li, Yangzi [1 ]
Tang, Zhicao [1 ]
Huang, Yuan [1 ]
Zhou, Haojie [1 ]
Tang, Mingdong [2 ]
Zheng, Zibin [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Guangdong Univ Foreign Studies, Guangzhou, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Commit Message Generation; Code Change; Edit Sequence; Bi-Encoder; Abstract Syntax Tree;
D O I
10.1145/3643916.3644414
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Commit messages provide important information for comprehending the code changes, and a number of researchers try to generate commit messages by using an automatic way. These research on commit message generation has profited from the code tokens or code structures such as AST. Since the edit sequence of code change is also important for capturing the code change intent, we propose a new commit message generation method called ESGen, which extracts AST edit sequences of code changes as model input. Specifically, we employ an O(ND) difference algorithm to extract the edit sequence from AST by comparing the ASTs before and after applying the code changes. Then, we construct a Bi-Encoder, which encodes the textual information and the AST edit sequence information of code change. The experimental results show that ESGen outperforms other baseline models, improving the BLEU-4 to 15.14. Also, when applying the edit sequence to 7 baseline models, they improve the BLEU-4 scores of these models by an average of 8.5%. Additionally, a human evaluation confirmed the effectiveness of ESGen in generating commit messages.
引用
收藏
页码:112 / 124
页数:13
相关论文
共 50 条
  • [31] CodePAD: Sequence-based Code Generation with Pushdown Automaton
    Dong, Yihong
    Jiang, Xue
    Liu, Yuchen
    Li, Ge
    Jin, Zhi
    arXiv, 2022,
  • [32] Automated Commit Message Generation With Large Language Models: An Empirical Study and Beyond
    Xue, Pengyu
    Wu, Linhao
    Yu, Zhongxing
    Jin, Zhi
    Yang, Zhen
    Li, Xinyi
    Yang, Zhenyu
    Tan, Yue
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3208 - 3224
  • [33] Time Sequence Clustering based on Edit Distance
    Zhou, Haiyan
    Shan, JingSong
    FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY III, PTS 1-3, 2013, 401 : 1428 - 1431
  • [34] Predicting Build Co-Changes with Source Code Change and Commit Categories
    Macho, Christian
    McIntosh, Shane
    Pinzger, Martin
    2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 1, 2016, : 541 - 551
  • [35] A large-scale empirical study of commit message generation: models, datasets and evaluation
    Tao, Wei
    Wang, Yanlin
    Shi, Ensheng
    Du, Lun
    Han, Shi
    Zhang, Hongyu
    Zhang, Dongmei
    Zhang, Wenqiang
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (07)
  • [36] A large-scale empirical study of commit message generation: models, datasets and evaluation
    Wei Tao
    Yanlin Wang
    Ensheng Shi
    Lun Du
    Shi Han
    Hongyu Zhang
    Dongmei Zhang
    Wenqiang Zhang
    Empirical Software Engineering, 2022, 27
  • [37] CommitBERT: Commit message generation using pre-trained programming language model
    Jung, Tae-Hwan
    arXiv, 2021,
  • [38] CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model
    Jung, Tae-Hwan
    NLP4PROG 2021: THE 1ST WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR PROGRAMMING (NLP4PROG 2021), 2021, : 26 - 33
  • [39] A large-scale empirical study of commit message generation: models, datasets and evaluation
    Tao, Wei
    Wang, Yanlin
    Shi, Ensheng
    Du, Lun
    Han, Shi
    Zhang, Hongyu
    Zhang, Dongmei
    Zhang, Wenqiang
    Empirical Software Engineering, 2022, 27 (07):
  • [40] A sequence perturbation based islanding detection for distributed generation with periodic code matching
    Li, Deshi
    Li, Lei
    Yang, Chenglong
    Zhang, Song
    JOURNAL OF RENEWABLE AND SUSTAINABLE ENERGY, 2015, 7 (01)