IH:mpirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

被引:32
|
作者
Gong, Lina [1 ,2 ,3 ]
Jiang, Shujuan [1 ,2 ]
Wang, Rongcun [1 ,2 ]
Jiang, Li [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Jiangsu, Peoples R China
[3] Zaozhuang Univ, Dept Informat Sci & Engn, Zaozhuang 277160, Peoples R China
基金
中国国家自然科学基金;
关键词
Class overlap; Software defect prediction; K Means clustering; Machine learning; MACHINE;
D O I
10.1109/ASE.2019.00071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contains some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within -project defect prediction (WPDP) (ii) cross -project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above mentioned cases by using IKMCCA or KMCCA or NCL VS. without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.
引用
收藏
页码:710 / 721
页数:12
相关论文
共 50 条
  • [41] The impact of using biased performance metrics on software defect prediction research
    Yao, Jingxiu
    Shepperd, Martin
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139 (139)
  • [42] Defect prediction for embedded software
    Oral, Atac Deniz
    Bener, Ayse Basar
    2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 346 - 351
  • [43] Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    PROCEEDINGS 18TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY & QUALITY IN DESIGN, 2012, : 91 - +
  • [44] Defect evaluation software
    不详
    MICRO, 1995, 13 (08): : 61 - 61
  • [45] Does class size matter? An in-depth assessment of the effect of class size in software defect prediction
    Tahir, Amjed
    Bennin, Kwabena E.
    Xiao, Xun
    MacDonell, Stephen G.
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (05)
  • [46] Does class size matter? An in-depth assessment of the effect of class size in software defect prediction
    Amjed Tahir
    Kwabena E. Bennin
    Xun Xiao
    Stephen G. MacDonell
    Empirical Software Engineering, 2021, 26
  • [47] The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models
    Tantithamthavorn, Chakkrit
    Hassan, Ahmed E.
    Matsumoto, Kenichi
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (11) : 1200 - 1219
  • [48] Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction
    Malhotra, Ruchika
    Agrawal, Vaibhav
    Pal, Vedansh
    Agarwal, Tushar
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 1078 - 1083
  • [49] A novel software defect prediction based on atomic class-association rule mining
    Shao, Yuanxun
    Liu, Bin
    Wang, Shihai
    Li, Guoqi
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 237 - 254
  • [50] An empirical study toward dealing with noise and class imbalance issues in software defect prediction
    Pandey, Sushant Kumar
    Tripathi, Anil Kumar
    SOFT COMPUTING, 2021, 25 (21) : 13465 - 13492