Evaluating Stratification Alternatives to Improve Software Defect Prediction

被引:38
|
作者
Pelayo, Lourdes [1 ]
Dick, Scott [1 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Learning in imbalanced datasets; machine learning; non-parametric models; software fault-proneness; software reliability; stratification; KNOWLEDGE DISCOVERY; QUALITY; CLASSIFICATION; METRICS; SMOTE; RELIABILITY; VALIDITY; MODULES; UTILITY; MODELS;
D O I
10.1109/TR.2012.2183912
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous studies have applied machine learning to the software defect prediction problem, i.e. predicting which modules will experience a failure during operation based on software metrics. However, skewness in defect-prediction datasets can mean that the resulting classifiers often predict the faulty (minority) class less accurately. This problem is well known in machine learning, and is often referred to as "learning from imbalanced datasets." One common approach for mitigating skewness is to use stratification to homogenize class distributions; however, it is unclear what stratification techniques are most effective, both generally and specifically in software defect prediction. In this article, we investigate two major stratification alternatives (under-, and over-sampling) for software defect prediction using Analysis of Variance. Our analysis covers several modern software defect prediction datasets using a factorial design. We find that the main effect of under-sampling is significant at alpha = 0.05, as is the interaction between under-and over-sampling. However, the main effect of over-sampling is not significant.
引用
收藏
页码:516 / 525
页数:10
相关论文
共 50 条
  • [1] Analysis of Evolutionary Algorithms to Improve Software Defect Prediction
    Malhotra, Ruchika
    Khurana, Anshu
    2017 6TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2017, : 301 - 305
  • [2] Exploring better alternatives to size metrics for explainable software defect prediction
    Chai, Chenchen
    Fan, Guisheng
    Yu, Huiqun
    Huang, Zijie
    Ding, Jianshu
    Guan, Yao
    SOFTWARE QUALITY JOURNAL, 2024, 32 (02) : 459 - 486
  • [3] Evaluating Defect Prediction Models for a Large Evolving Software System
    Mende, Thilo
    Koschke, Rainer
    Leszak, Marek
    13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 247 - +
  • [4] node2defect: Using Network Embedding to Improve Software Defect Prediction
    Qu, Yu
    Liu, Ting
    Chi, Jianlei
    Jin, Yangxu
    Cui, Di
    He, Ancheng
    Zheng, Qinghua
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 844 - 849
  • [5] USING CLUSTERING TECHNIQUES GOES WITH GENETIC ALGORITHM TO IMPROVE SOFTWARE DEFECT PREDICTION
    Kuo, Chia-Hao
    Chang, Ching-Pao
    Lin, Yu-Shih
    Chu, Chih-Ping
    PROCEEDINGS OF THE 2011 3RD INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2011), 2011, : 165 - 170
  • [6] Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1806 - 1817
  • [7] Class Balancing Approaches to Improve for Software Defect Prediction Estimations: A Comparative Study
    Sanchez-Garcia, angel J.
    Limon, Xavier
    Dominguez-Isidro, Saul
    Olvera-Villeda, Dan Javier
    Perez-Arriaga, Juan Carlos
    PROGRAMMING AND COMPUTER SOFTWARE, 2024, 50 (08) : 621 - 647
  • [8] A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning
    Mehmood, Iqra
    Shahid, Sidra
    Hussain, Hameed
    Khan, Inayat
    Ahmad, Shafiq
    Rahman, Shahid
    Ullah, Najeeb
    Huda, Shamsul
    IEEE ACCESS, 2023, 11 : 63579 - 63597
  • [9] Research on Cross-Company Defect Prediction Method to Improve Software Security
    Shao, Yanli
    Zhao, Jingru
    Wang, Xingqi
    Wu, Weiwei
    Fang, Jinglong
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [10] Hellinger Net: A Hybrid Imbalance Learning Model to Improve Software Defect Prediction
    Chakraborty, Tanujit
    Chakraborty, Ashis Kumar
    IEEE TRANSACTIONS ON RELIABILITY, 2021, 70 (02) : 481 - 494