An Imbalanced Deep Learning Model for Bug Localization

被引:2
|
作者
Bui Thi Mai Anh [1 ]
Nguyen Viet Luyen [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Informat & Commun Technol, Lab Intelligent Software Engn, Hanoi, Vietnam
关键词
bug localization; deep neural network; imbalanced data-set; bootstrapping;
D O I
10.1109/APSECW53869.2021.00017
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Debugging and locating faulty source files are tedious and time-consuming tasks. To improve the productivity and to help developers focus on crucial files, automated bug localization models have been proposed for years. These models recommend buggy source files by ranking them according to their relevance to a given bug report. There are two significant challenges in this research field: (i) narrowing the lexical gap between bug reports which are typically described using natural languages and source files written in programming languages; (ii) reducing the impact of imbalanced data distribution in model training as a far fewer of source files relate to a given bug report while the majority of them are not relevant. In this paper, we propose a deep neural network model to investigate essential information hidden within bug reports and source files through capturing not only lexical relations but also semantic details as well as domain knowledge features such as historical bug fixings, code change history. To address the skewed class distribution, we apply a focal loss function combining with a bootstrapping method to rectify samples of the minority class within iterative training batches to our proposed model. We assessed the performance of our approach over six large scale Java open-source projects. The empirical results have showed that the proposed method outperformed other state-of-the-art models by improving the Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) scores from 3% to 11% and from 2% to 14%, respectively.
引用
收藏
页码:32 / 40
页数:9
相关论文
共 50 条
  • [21] Deep reinforcement learning for imbalanced classification
    Enlu Lin
    Qiong Chen
    Xiaoming Qi
    Applied Intelligence, 2020, 50 : 2488 - 2502
  • [22] Procrustean Training for Imbalanced Deep Learning
    Ye, Han-Jia
    Zhan, De-Chuan
    Chao, Wei-Lun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 92 - 102
  • [23] High-Impact Bug Report Identification with Imbalanced Learning Strategies
    Xin-Li Yang
    David Lo
    Xin Xia
    Qiao Huang
    Jian-Ling Sun
    Journal of Computer Science and Technology, 2017, 32 : 181 - 198
  • [24] TROBO: A Novel Deep Transfer Model for Enhancing Cross-Project Bug Localization
    Zhu, Ziye
    Wang, Yu
    Li, Yun
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 529 - 541
  • [25] High-Impact Bug Report Identification with Imbalanced Learning Strategies
    Yang, Xin-Li
    Lo, David
    Xia, Xin
    Huang, Qiao
    Sun, Jian-Ling
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (01) : 181 - 198
  • [26] An Empirical Study of IR-based Bug Localization for Deep Learning-based Software
    Kim, Misoo
    Kim, Youngkyoung
    Lee, Eunseok
    2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2022), 2022, : 128 - 139
  • [27] Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion Classification
    Yao, Peng
    Shen, Shuwei
    Xu, Mengjuan
    Liu, Peng
    Zhang, Fan
    Xing, Jinyu
    Shao, Pengfei
    Kaffenberger, Benjamin
    Xu, Ronald X.
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (05) : 1242 - 1254
  • [28] Plankton Detection with Adversarial Learning and a Densely Connected Deep Learning Model for Class Imbalanced Distribution
    Li, Yan
    Guo, Jiahong
    Guo, Xiaomin
    Hu, Zhiqiang
    Tian, Yu
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2021, 9 (06)
  • [29] A Deep-Learning Prediction Model for Imbalanced Time Series Data Forecasting
    Chenyu Hou
    Jiawei Wu
    Bin Cao
    Jing Fan
    Big Data Mining and Analytics, 2021, 4 (04) : 266 - 278
  • [30] Be Careful About Metrics When Imbalanced Data Is Used for a Deep Learning Model
    Usuzaki, Takuma
    Takahashi, Kengo
    Inamori, Ryusei
    CHEST, 2024, 165 (03) : e87 - e89