Evaluating Code Comment Generation With Summarized API Docs

被引：0

作者：

Matmti, Bilel ^{[1
]}

Fard, Fatemeh ^{[1
]}

机构：

[1] Univ British Columbia, Dept Comp Sci, Okanagan, BC, Canada

来源：

2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

API Docs; text summarization; comment generation; external knowledge source;

D O I：

10.1109/NLBSE59153.2023.00019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Code comment generation is the task of generating a high-level natural language description for a given code snippet. API2Com is a comment generation model designed to leverage the Application Programming Interface Documentations (API Docs) as an external knowledge resource. Shahbazi et al. [1] showed that API Docs might help increase the model's performance. However, the model's performance in generating pertinent comments deteriorates due to the lengthy documentation used in the input as the number of APIs used in a method increases. In this paper, we propose to evaluate how summarizing the API Docs using an extractive text summarization technique, TextRank, will impact the overall performance of the API2Com. The results of our experiments using the same Java dataset confirm the inverse correlation between the number of APIs and the model's performance. As the number of APIs increases, the performance metrics tend to deteriorate for both configurations of the model, with or without API Docs summarization using TextRank. Experiments also show the impact of the number of APIs on TextRank algorithm capacity to improve the model performance. For example, with 8 APIs, TextRank summarization improved the model BLEU score by 18% on average, but the performance tends to decrease as the number of APIs increases. This demonstrates an open area of research to determine the winning combination in terms of the model configuration and the length of documentation used.

引用

页码：60 / 63

页数：4

共 50 条

[1] Evaluating Code Comment Generation with Summarized API Docs
Matmti, Bilel
Fard, Fatemeh
Proceedings - 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering, NLBSE 2023, 2023, : 60 - 63
[2] Improving access to API documentation for developers with Docs-as-Code-as-a-service
Thomchick R.
Thomchick, Richard (richardt@vmware.com), 2018, John Wiley and Sons Inc (55) : 908 - 910
[3] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
Wu, Yixi
He, Pengfei
Wang, Zehao
Wang, Shaowei
Tian, Yuan
Chen, Tse-Hsun
arXiv,
[4] APIContext2Com: Code Comment Generation by Incorporating Pre-Defined API Documentation
Shahbazi, Ramin
Fard, Fatemeh
2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 13 - 24
[5] Deep Code Comment Generation
Hu, Xing
Li, Ge
Xia, Xin
Lo, David
Jin, Zhi
2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 200 - 210
[6] Embedding API dependency graph for neural code generation
Lyu, Chen
Wang, Ruyun
Zhang, Hongyu
Zhang, Hanwen
Hu, Songlin
arXiv, 2021,
[7] Refactoring Java']Java Code for Automatic API Generation
Liu, Genggeng
Hu, Chuanshumin
Chen, Shihong
Zhang, Ying
Chen, Xing
2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 114 - 119
[8] Embedding API dependency graph for neural code generation
Chen Lyu
Ruyun Wang
Hongyu Zhang
Hanwen Zhang
Songlin Hu
Empirical Software Engineering, 2021, 26
[9] Embedding API dependency graph for neural code generation
Lyu, Chen
Wang, Ruyun
Zhang, Hongyu
Zhang, Hanwen
Hu, Songlin
EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (04)
[10] CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Zhou, Shuyan
Alon, Uri
Agarwal, Sumit
Neubig, Graham
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13921 - 13937

← 1 2 3 4 5 →