A Comparison of Pretrained Models for Classifying Issue Reports

被引:0
|
作者
Heo, Jueun [1 ]
Kwon, Gibeom [1 ]
Kwak, Changwon [1 ]
Lee, Seonah [1 ,2 ]
机构
[1] Gyeongsang Natl Univ, Dept AI Convergence Engn, Jinju 52828, South Korea
[2] Gyeongsang Natl Univ, Dept Software Engn, Jinju 52828, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Task analysis; Data models; Software engineering; Computer bugs; Codes; Bidirectional control; Encoding; Issue reports; issue classification; BERT; pretrained models; deep learning techniques;
D O I
10.1109/ACCESS.2024.3408688
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Issues are evolving requirements in software engineering. They are the main factors that increase the cost of software evolution. To help developers manage issues, GitHub provides issue labeling mechanisms in issue management systems. However, manually labeling issue reports still requires considerable developer workload. To ease developers' burden, researchers have proposed automatically classifying issue reports. To improve the classification accuracy, researchers adopted deep learning techniques and pretrained models. However, pretrained models in the general domain such as RoBERTa have limitations in understanding the contexts of software engineering tasks. In this paper, we create a pretrained model, IssueBERT, with issue data to understand whether a domain-specific pretrained model could improve the accuracy of issue report classification. We also adopt and explore several pretrained models in the software engineering domain, namely, CodeBERT, BERTOverflow, and seBERT. We conduct a comparative experiment on these pretrained models to evaluate their performance in classifying issue reports. Our comparison results show that IssueBERT outperforms the other pretrained models. Noticeably, IssueBERT yields an average F1 score that is 1.74% higher than that of seBERT and 3.61% higher than that of RoBERTa, even though IssueBERT was pretrained with much less data than seBERT and RoBERTa.
引用
收藏
页码:79568 / 79584
页数:17
相关论文
共 50 条
  • [1] An Intelligent Tool for Classifying Issue Reports
    Laiq, Muhammad
    2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 13 - 15
  • [2] Automatic Component Prediction for Issue Reports Using Fine-Tuned Pretrained Language Models
    Wang, Dae-Sung
    Lee, Chan-Gun
    IEEE ACCESS, 2022, 10 : 131456 - 131468
  • [3] Automatic Issue Classifier: A Transfer Learning Framework for Classifying Issue Reports
    Nadeem, Anas
    Sarwar, Muhammad Usman
    Malik, Muhammad Zubair
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2021), 2021, : 421 - 426
  • [4] A taxonomy for mining and classifying privacy requirements in issue reports
    Sangaroonsilp, Pattaraporn
    Dam, Hoa Khanh
    Choetkiertikul, Morakot
    Ragkhitwetsagul, Chaiyong
    Ghose, Aditya
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 157
  • [5] Classifying Software Issue Reports through Association Mining
    Zolkeply, Mohd Syafiq
    Shao, Jianhua
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1860 - 1863
  • [6] EfficientNets for DeepFake Detection: Comparison of Pretrained Models
    Pokroy, Artem A.
    Egorov, Alexey D.
    PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS), 2021, : 598 - 600
  • [7] Performance Comparison of Pretrained Deep Learning Models for Landfill Waste Classification
    Younis, Hussein
    Obaid, Mahmoud
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 689 - 698
  • [8] Leveraging pretrained language models for seizure frequency extraction from epilepsy evaluation reports
    Rashmie Abeysinghe
    Shiqiang Tao
    Samden D. Lhatoo
    Guo-Qiang Zhang
    Licong Cui
    npj Digital Medicine, 8 (1)
  • [9] A Survey of Pretrained Language Models
    Sun, Kaili
    Luo, Xudong
    Luo, Michael Y.
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 442 - 456
  • [10] Domain-adapted Large Language Models for Classifying Nuclear Medicine Reports
    Huemann, Zachary
    Lee, Changhee
    Hu, Junjie
    Cho, Steve Y.
    Bradshaw, Tyler J.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2023, 5 (06)