Evaluating Stratification Alternatives to Improve Software Defect Prediction

被引:38
|
作者
Pelayo, Lourdes [1 ]
Dick, Scott [1 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Learning in imbalanced datasets; machine learning; non-parametric models; software fault-proneness; software reliability; stratification; KNOWLEDGE DISCOVERY; QUALITY; CLASSIFICATION; METRICS; SMOTE; RELIABILITY; VALIDITY; MODULES; UTILITY; MODELS;
D O I
10.1109/TR.2012.2183912
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous studies have applied machine learning to the software defect prediction problem, i.e. predicting which modules will experience a failure during operation based on software metrics. However, skewness in defect-prediction datasets can mean that the resulting classifiers often predict the faulty (minority) class less accurately. This problem is well known in machine learning, and is often referred to as "learning from imbalanced datasets." One common approach for mitigating skewness is to use stratification to homogenize class distributions; however, it is unclear what stratification techniques are most effective, both generally and specifically in software defect prediction. In this article, we investigate two major stratification alternatives (under-, and over-sampling) for software defect prediction using Analysis of Variance. Our analysis covers several modern software defect prediction datasets using a factorial design. We find that the main effect of under-sampling is significant at alpha = 0.05, as is the interaction between under-and over-sampling. However, the main effect of over-sampling is not significant.
引用
收藏
页码:516 / 525
页数:10
相关论文
共 50 条
  • [41] Evaluating Software Metrics for Sorting Software Modules in Order of Defect Count
    Yang, Xiaoxing
    ICSOFT: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2019, : 94 - 105
  • [42] Software defect association mining and defect correction effort prediction
    Song, QB
    Shepperd, M
    Cartwright, M
    Mair, C
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (02) : 69 - 82
  • [43] The Stability of Threshold Values for Software Metrics in Software Defect Prediction
    Mausa, Goran
    Grbac, Tihana Galinac
    MODEL AND DATA ENGINEERING (MEDI 2017), 2017, 10563 : 81 - 95
  • [44] Evaluating Static Analysis Defect Warnings On Production Software
    Ayewah, Nathaniel
    Pugh, William
    Morgenthaler, J. David
    Penix, John
    Zhou, YuQian
    PASTE'07 PROCEEDINGS OF THE 2007 ACM SIGPLAN- SIGSOFT WORKSHOP ON PROGRAM ANALYSIS FOR SOFTWARE TOOLS & ENGINEERING, 2007, : 1 - +
  • [45] Early Software Defect Prediction: Right-Shifting Software Effort Data into a Defect Curve
    Okumoto, Kazuhira
    2022 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2022), 2022, : 43 - 48
  • [46] Multilabel classification for defect prediction in software engineering
    Pachouly, Jalaj
    Ahirrao, Swati
    Kotecha, Ketan
    Kulkarni, Ambarish
    Alfarhood, Sultan
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [47] A Survey on Software Defect Prediction in Cross Project
    Jadhav, Rohini
    Joshi, Shashank. D.
    Thorat, Umesh
    Joshi, Aditi S.
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 1014 - 1019
  • [48] A Rough Set Model for Software Defect Prediction
    Yang Weimin
    Li Longshu
    INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 747 - +
  • [49] Current Software Defect Prediction: A Systematic Review
    Bala, Yahaya Zakariyau
    Samat, Pathiah Abdul
    Sharif, Khaironi Yatim
    Manshor, Noridayu
    Proceedings - AiIC 2022: 2022 Applied Informatics International Conference: Digital Innovation in Applied Informatics during the Pandemic, 2022, : 117 - 121
  • [50] On Software Defect Prediction Using Machine Learning
    Ren, Jinsheng
    Qin, Ke
    Ma, Ying
    Luo, Guangchun
    JOURNAL OF APPLIED MATHEMATICS, 2014,