A Comparison of Pretrained Models for Classifying Issue Reports

被引：0

作者：

Heo, Jueun ^{[1
]}

Kwon, Gibeom ^{[1
]}

Kwak, Changwon ^{[1
]}

Lee, Seonah ^{[1
,2
]}

机构：

[1] Gyeongsang Natl Univ, Dept AI Convergence Engn, Jinju 52828, South Korea

[2] Gyeongsang Natl Univ, Dept Software Engn, Jinju 52828, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Task analysis; Data models; Software engineering; Computer bugs; Codes; Bidirectional control; Encoding; Issue reports; issue classification; BERT; pretrained models; deep learning techniques;

D O I：

10.1109/ACCESS.2024.3408688

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Issues are evolving requirements in software engineering. They are the main factors that increase the cost of software evolution. To help developers manage issues, GitHub provides issue labeling mechanisms in issue management systems. However, manually labeling issue reports still requires considerable developer workload. To ease developers' burden, researchers have proposed automatically classifying issue reports. To improve the classification accuracy, researchers adopted deep learning techniques and pretrained models. However, pretrained models in the general domain such as RoBERTa have limitations in understanding the contexts of software engineering tasks. In this paper, we create a pretrained model, IssueBERT, with issue data to understand whether a domain-specific pretrained model could improve the accuracy of issue report classification. We also adopt and explore several pretrained models in the software engineering domain, namely, CodeBERT, BERTOverflow, and seBERT. We conduct a comparative experiment on these pretrained models to evaluate their performance in classifying issue reports. Our comparison results show that IssueBERT outperforms the other pretrained models. Noticeably, IssueBERT yields an average F1 score that is 1.74% higher than that of seBERT and 3.61% higher than that of RoBERTa, even though IssueBERT was pretrained with much less data than seBERT and RoBERTa.

引用

页码：79568 / 79584

页数：17

共 50 条

[1] An Intelligent Tool for Classifying Issue Reports
Laiq, Muhammad
2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 13 - 15
[2] Automatic Component Prediction for Issue Reports Using Fine-Tuned Pretrained Language Models
Wang, Dae-Sung
Lee, Chan-Gun
IEEE ACCESS, 2022, 10 : 131456 - 131468
[3] Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports
Nadeem, Anas
Sarwar, Muhammad Usman
Malik, Muhammad Zubair
2021 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2021), 2021, : 421 - 426
[4] A taxonomy for mining and classifying privacy requirements in issue reports
Sangaroonsilp, Pattaraporn
Dam, Hoa Khanh
Choetkiertikul, Morakot
Ragkhitwetsagul, Chaiyong
Ghose, Aditya
INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 157
[5] Classifying Software Issue Reports through Association Mining
Zolkeply, Mohd Syafiq
Shao, Jianhua
SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1860 - 1863
[6] EfficientNets for DeepFake Detection: Comparison of Pretrained Models
Pokroy, Artem A.
Egorov, Alexey D.
PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS), 2021, : 598 - 600
[7] Performance Comparison of Pretrained Deep Learning Models for Landfill Waste Classification
Younis, Hussein
Obaid, Mahmoud
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 689 - 698
[8] Leveraging pretrained language models for seizure frequency extraction from epilepsy evaluation reports
Rashmie Abeysinghe
Shiqiang Tao
Samden D. Lhatoo
Guo-Qiang Zhang
Licong Cui
npj Digital Medicine, 8 (1)
[9] A Survey of Pretrained Language Models
Sun, Kaili
Luo, Xudong
Luo, Michael Y.
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 442 - 456
[10] Domain-adapted Large Language Models for Classifying Nuclear Medicine Reports
Huemann, Zachary
Lee, Changhee
Hu, Junjie
Cho, Steve Y.
Bradshaw, Tyler J.
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2023, 5 (06)

← 1 2 3 4 5 →