Using Class Imbalance Learning for Software Defect Prediction

被引:389
|
作者
Wang, Shuo [1 ]
Yao, Xin [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, CERCIA, Birmingham B15 2TT, W Midlands, England
基金
英国工程与自然科学研究理事会;
关键词
Class imbalance learning; ensemble learning; negative correlation learning; software defect prediction; STATIC CODE ATTRIBUTES; NEURAL-NETWORKS; MACHINE;
D O I
10.1109/TR.2013.2259203
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To facilitate software testing, and save testing costs, a wide range of machine learning methods have been studied to predict defects in software modules. Unfortunately, the imbalanced nature of this type of data increases the learning difficulty of such a task. Class imbalance learning specializes in tackling classification problems with imbalanced distributions, which could be helpful for defect prediction, but has not been investigated in depth so far. In this paper, we study the issue of if and how class imbalance learning methods can benefit software defect prediction with the aim of finding better solutions. We investigate different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms. Among those methods we studied, AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC). To further improve the performance of the algorithm, and facilitate its use in software defect prediction, we propose a dynamic version of AdaBoost. NC, which adjusts its parameter automatically during training. Without the need to pre-define any parameters, it is shown to be more effective and efficient than the original AdaBoost. NC.
引用
收藏
页码:434 / 443
页数:10
相关论文
共 50 条
  • [41] Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction
    C. Arun
    C. Lakshmi
    Soft Computing, 2022, 26 : 12915 - 12931
  • [42] Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction
    Arun, C.
    Lakshmi, C.
    SOFT COMPUTING, 2022, 26 (23) : 12915 - 12931
  • [43] Software Defect Prediction Analysis Using Machine Learning Techniques
    Khalid, Aimen
    Badshah, Gran
    Ayub, Nasir
    Shiraz, Muhammad
    Ghouse, Mohamed
    SUSTAINABILITY, 2023, 15 (06)
  • [44] Software defect prediction using ensemble learning on selected features
    Laradji, Issam H.
    Alshayeb, Mohammad
    Ghouti, Lahouari
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 58 : 388 - 402
  • [45] Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric
    Gupta S.
    Richa
    Kumar R.
    Jain K.L.
    SN Computer Science, 4 (5)
  • [46] Software Defect Prediction Analysis Using Machine Learning Algorithms
    Singh, Praman Deep
    Chug, Anuradha
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING (CONFLUENCE 2017), 2017, : 775 - 781
  • [47] Handling Class Imbalance in Link Prediction Using Learning to Rank Techniques
    Li, Bopeng
    Chaudhuri, Sougata
    Tewari, Ambuj
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4226 - 4227
  • [48] Active Learning for Software Defect Prediction
    Luo, Guangchun
    Ma, Ying
    Qin, Ke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (06) : 1680 - 1683
  • [49] Implementation of Data Sampling in Class Imbalance Learning for Cross Project Defect Prediction : An Empirical Study
    Goel, Lipika
    Sharma, Mayank
    Khatri, Sunil Kumar
    Damodaran, D.
    2018 FIFTH INTERNATIONAL SYMPOSIUM ON INNOVATION IN INFORMATION AND COMMUNICATION TECHNOLOGY (ISIICT 2018), 2018, : 8 - 13
  • [50] Software bug priority prediction technique based on intuitionistic fuzzy representation and class imbalance learning
    Rama Ranjan Panda
    Naresh Kumar Nagwani
    Knowledge and Information Systems, 2024, 66 : 2135 - 2164