Multi-grained contextual code representation learning for commit message generation

被引:3
|
作者
Wang, Chuangwei [1 ]
Zhang, Li [1 ]
Zhang, Xiaofang [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Code change; Code representation learning; Commit message generation; Pre-training; COMPLETION;
D O I
10.1016/j.infsof.2023.107393
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the semantic and structural gap between code and natural language poses a significant challenge for commit message generation. Several researchers have proposed automated techniques to generate commit messages. Nevertheless, the information about the code is not sufficiently exploited. In this paper, we propose multi-grained contextual code representation learning for commit message generation (COMU). We extract multi-grained information from the changed code at the line and AST levels (i.e., Code_Diff and AST_Diff). In Code_Diff, we construct global contextual semantic information about the changed code, and mark whether a line of code has changed with three different tokens. In AST_Diff, we extract the code structure from source code changes and combine the extracted structure with four types of editing operations to explicitly focus on the detailed information of the changed part. In addition, we build the experimental datasets, since there is still no publicly sufficient dataset for this task. The release of this dataset would contribute to advancing research in this field. We perform an extensive experiment to evaluate the effectiveness of COMU. The experimental evaluation and human study show that our model outperforms the baseline model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] CoreGen: Contextualized Code Representation Learning for Commit Message Generation
    Nie, Lun Yiu
    Gao, Cuiyun
    Zhong, Zhicong
    Lam, Wai
    Liu, Yang
    Xu, Zenglin
    NEUROCOMPUTING, 2021, 459 : 97 - 107
  • [2] Combining Code Context and Fine-grained Code Difference for Commit Message Generation
    Xu, Shengbin
    Yao, Yuan
    Xu, Feng
    Gu, Tianxiao
    Tong, Hanghang
    13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022, 2022, : 242 - 251
  • [3] FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation
    Dong, Jinhao
    Lou, Yiling
    Zhu, Qihao
    Sun, Zeyu
    Li, Zhilin
    Zhang, Wenjie
    Hao, Dan
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 970 - 981
  • [4] Multi-grained Representation Learning for Cross-modal Retrieval
    Zhao, Shengwei
    Xu, Linhai
    Liu, Yuying
    Du, Shaoyi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2194 - 2198
  • [5] MGCoT : Multi-Grained Contextual Transformer for table-based text generation
    Mo, Xianjie
    Xiang, Yang
    Pan, Youcheng
    Hou, Yongshuai
    Luo, Ping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 250
  • [6] Context-Encoded Code Change Representation for Automated Commit Message Generation
    Thanh Trong Vu
    Thanh-Dat Do
    Hieu Dinh Vo
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (01) : 185 - 202
  • [7] Multi-Grained Cascade AdaBoost Extreme Learning Machine for Feature Representation
    Ge, Hongwei
    Sun, Weiting
    Zhao, Mingde
    Zhang, Kai
    Sun, Liang
    Yu, Chao
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] A Technique to Detect Multi-grained Code Clones
    Yuki, Yusuke
    Higo, Yoshiki
    Kusumoto, Shinji
    2017 IEEE 11TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2017, : 54 - 60
  • [9] Commit Message Generation for Source Code Changes
    Xu, Shengbin
    Yao, Yuan
    Xu, Feng
    Gu, Tianxiao
    Tong, Hanghang
    Lu, Jian
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3975 - 3981
  • [10] Enhancing identification for person search with multi-scale multi-grained representation learning
    Han, Zhixiong
    Ma, Bingpeng
    PATTERN RECOGNITION, 2024, 150