Evaluating Stratification Alternatives to Improve Software Defect Prediction

被引:38
|
作者
Pelayo, Lourdes [1 ]
Dick, Scott [1 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Learning in imbalanced datasets; machine learning; non-parametric models; software fault-proneness; software reliability; stratification; KNOWLEDGE DISCOVERY; QUALITY; CLASSIFICATION; METRICS; SMOTE; RELIABILITY; VALIDITY; MODULES; UTILITY; MODELS;
D O I
10.1109/TR.2012.2183912
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous studies have applied machine learning to the software defect prediction problem, i.e. predicting which modules will experience a failure during operation based on software metrics. However, skewness in defect-prediction datasets can mean that the resulting classifiers often predict the faulty (minority) class less accurately. This problem is well known in machine learning, and is often referred to as "learning from imbalanced datasets." One common approach for mitigating skewness is to use stratification to homogenize class distributions; however, it is unclear what stratification techniques are most effective, both generally and specifically in software defect prediction. In this article, we investigate two major stratification alternatives (under-, and over-sampling) for software defect prediction using Analysis of Variance. Our analysis covers several modern software defect prediction datasets using a factorial design. We find that the main effect of under-sampling is significant at alpha = 0.05, as is the interaction between under-and over-sampling. However, the main effect of over-sampling is not significant.
引用
收藏
页码:516 / 525
页数:10
相关论文
共 50 条
  • [21] Active Learning for Software Defect Prediction
    Luo, Guangchun
    Ma, Ying
    Qin, Ke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (06) : 1680 - 1683
  • [22] Classifier Evaluation for Software Defect Prediction
    Kou, Gang
    Peng, Yi
    Shi, Yong
    Wu, Wenshuai
    STUDIES IN INFORMATICS AND CONTROL, 2012, 21 (02): : 117 - 126
  • [23] Software Defect Prediction with Skewed Data
    Seliya, Naeem
    Khoshgoftaar, Taghi M.
    16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, 2010, : 403 - +
  • [24] Software Defect Prediction via Transformer
    Zhang, Qihang
    Wu, Bin
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 874 - 879
  • [25] A critique of software defect prediction models
    Fenton, NE
    Neil, M
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1999, 25 (05) : 675 - 689
  • [26] A Systematic Review on Software Defect Prediction
    Singh, Pradeep Kumar
    Agarwal, Dishti
    Gupta, Aakriti
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1793 - 1797
  • [27] On the Costs and Profit of Software Defect Prediction
    Herbold, Steffen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (11) : 2617 - 2631
  • [28] Progress on approaches to software defect prediction
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    IET SOFTWARE, 2018, 12 (03) : 161 - 175
  • [29] Software defect prediction via LSTM
    Deng, Jiehan
    Lu, Lu
    Qiu, Shaojian
    IET SOFTWARE, 2020, 14 (04) : 443 - 450
  • [30] Survey of software defect prediction features
    Shaoming Qiu
    Bicong E
    Jingjie He
    Liangyu Liu
    Neural Computing and Applications, 2025, 37 (4) : 2113 - 2144