An Empirical Study of the Impact of Class Overlap on the Performance and Interpretability of Cross-Version Defect Prediction

被引:0
|
作者
Han, Hui [1 ]
Yu, Qiao [1 ]
Zhu, Yi [1 ]
Cheng, Shengyi [1 ]
Zhang, Yu [1 ]
机构
[1] Jiangsu Normal Univ, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Software defect prediction; cross-version defect prediction; class overlap;
D O I
10.1142/S0218194024500414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class overlap problem refers to instances from different categories heavily overlapping in the feature space. This issue is one of the challenges in improving the performance of software defect prediction (SDP). Currently, the studies on the impact of class overlap on SDP mainly focused on within-project defect prediction and cross-project defect prediction. Moreover, the existing class overlap instances cleaning methods are not suitable for cross-version defect prediction. In this paper, we propose a class overlap instances cleaning method based on the Ratio of K-nearest neighbors with the Same Label (RKSL). This method removes instances with the abnormal neighbor ratio in the training set. Based on the RKSL method, we investigate the impact of class overlap on the performance and interpretability of the cross-version defect prediction model. The experiment results show that class overlap can affect the performance of cross-version defect prediction models significantly. The RKSL method can handle the class overlap problem in defect datasets, but it may impact the interpretability of models. Through the analysis of feature changes, we consider that class overlap instances cleaning can assist models in identifying more important features.
引用
收藏
页码:1895 / 1918
页数:24
相关论文
共 50 条
  • [31] An ensemble model for addressing class imbalance and class overlap in software defect prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (12) : 5584 - 5603
  • [32] An Empirical Study of Classifier Combination for Cross-Project Defect Prediction
    Zhang, Yun
    Lo, David
    Xia, Xin
    Sun, Jianling
    39TH ANNUAL IEEE COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2015), VOL 2, 2015, : 264 - 269
  • [33] The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models
    Tantithamthavorn, Chakkrit
    Hassan, Ahmed E.
    Matsumoto, Kenichi
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (11) : 1200 - 1219
  • [34] An empirical study to investigate the impact of data resampling techniques on the performance of class maintainability prediction models
    Malhotra, Ruchika
    Lata, Kusum
    NEUROCOMPUTING, 2021, 459 : 432 - 453
  • [35] Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study
    Balogun, Abdullateef O.
    Basri, Shuib
    Mahamad, Saipunidzam
    Abdulkadir, Said J.
    Almomani, Malek A.
    Adeyemo, Victor E.
    Al-Tashi, Qasem
    Mojeed, Hammed A.
    Imam, Abdullahi A.
    Bajeh, Amos O.
    SYMMETRY-BASEL, 2020, 12 (07):
  • [36] Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
    Obster, Fabian
    Ciolacu, Monica I.
    Humpe, Andreas
    IEEE ACCESS, 2024, 12 : 195613 - 195628
  • [37] ROCT: Radius-based Class Overlap Cleaning Technique to Alleviate the Class Overlap Problem in Software Defect Prediction
    Feng, Shuo
    Keung, Jacky
    Liu, Jie
    Xiao, Yan
    Yu, Xiao
    Zhang, Miao
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 228 - 237
  • [38] Combined classifier for cross-project defect prediction: an extended empirical study
    Yun Zhang
    David Lo
    Xin Xia
    Jianling Sun
    Frontiers of Computer Science, 2018, 12 : 280 - 296
  • [39] An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction
    Yu, Qiao
    Qian, Junyan
    Jiang, Shujuan
    Wu, Zhenhua
    Zhang, Gongjie
    IEEE ACCESS, 2019, 7 : 35710 - 35718
  • [40] An Empirical Study of Software Metrics Diversity for Cross-Project Defect Prediction
    Zhong Y.
    Song K.
    Lv S.
    He P.
    Mathematical Problems in Engineering, 2021, 2021