IH:mpirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

被引:32
|
作者
Gong, Lina [1 ,2 ,3 ]
Jiang, Shujuan [1 ,2 ]
Wang, Rongcun [1 ,2 ]
Jiang, Li [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Minist Educ, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Jiangsu, Peoples R China
[3] Zaozhuang Univ, Dept Informat Sci & Engn, Zaozhuang 277160, Peoples R China
基金
中国国家自然科学基金;
关键词
Class overlap; Software defect prediction; K Means clustering; Machine learning; MACHINE;
D O I
10.1109/ASE.2019.00071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contains some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within -project defect prediction (WPDP) (ii) cross -project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above mentioned cases by using IKMCCA or KMCCA or NCL VS. without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.
引用
收藏
页码:710 / 721
页数:12
相关论文
共 50 条
  • [1] A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction
    Gong, Lina
    Zhang, Haoxiang
    Zhang, Jingxuan
    Wei, Mingqiang
    Huang, Zhiqiu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 2440 - 2458
  • [2] Tackling class overlap and imbalance problems in software defect prediction
    Lin Chen
    Bin Fang
    Zhaowei Shang
    Yuanyan Tang
    Software Quality Journal, 2018, 26 : 97 - 125
  • [3] Tackling class overlap and imbalance problems in software defect prediction
    Chen, Lin
    Fang, Bin
    Shang, Zhaowei
    Tang, Yuanyan
    SOFTWARE QUALITY JOURNAL, 2018, 26 (01) : 97 - 125
  • [4] An ensemble model for addressing class imbalance and class overlap in software defect prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (12) : 5584 - 5603
  • [5] ROCT: Radius-based Class Overlap Cleaning Technique to Alleviate the Class Overlap Problem in Software Defect Prediction
    Feng, Shuo
    Keung, Jacky
    Liu, Jie
    Xiao, Yan
    Yu, Xiao
    Zhang, Miao
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 228 - 237
  • [6] A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling
    Wang, Renliang
    Liu, Feng
    Bai, Yanhui
    ELECTRONICS, 2024, 13 (20)
  • [7] The Impact Study of Class Imbalance on the Performance of Software Defect Prediction Models
    Yu Q.
    Jiang S.-J.
    Zhang Y.-M.
    Wang X.-Y.
    Gao P.-F.
    Qian J.-Y.
    Qian, Jun-Yan (qjy2000@gmail.com), 2018, Science Press (41): : 809 - 824
  • [8] Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2024,
  • [9] An Empirical Study of the Impact of Class Overlap on the Performance and Interpretability of Cross-Version Defect Prediction
    Han, Hui
    Yu, Qiao
    Zhu, Yi
    Cheng, Shengyi
    Zhang, Yu
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (12) : 1895 - 1918
  • [10] Classifier Evaluation for Software Defect Prediction
    Kou, Gang
    Peng, Yi
    Shi, Yong
    Wu, Wenshuai
    STUDIES IN INFORMATICS AND CONTROL, 2012, 21 (02): : 117 - 126