Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction

被引:1
|
作者
Dar, Abdul Waheed [1 ]
Farooq, Sheikh Umar [1 ]
机构
[1] Univ Kashmir, Dept Comp Sci, North Campus, Srinagar, India
关键词
Class imbalance problem; Machine learning; Software defect prediction; Over-sampling; Under-sampling; PERFORMANCE; MACHINE; SMOTE;
D O I
10.1007/s11334-024-00571-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Various techniques in machine learning have been used for building software defect prediction (SDP) models to identify the defective software modules. However, a major challenge to SDP models is the class overlapping and the class imbalance problem of SDP datasets. This study proposes a new SDP model that combines the overlap-based under-sampling framework with the balanced random forest classifier to improve the identification of defective software modules. First, the duplicate instances of the dataset are removed to avoid the over-fitting of the model. Next, the overlapped majority non-defective class instances of the training data are removed by applying an overlap-based under-sampling technique to maximize the presence of minority defective class instances in a region where the two classes overlap. Finally, we use the balanced random forest, which combines the random under-sampling and the ensemble learning techniques on the pre-processed training data for achieving the goal of classification prediction. The efficacy of our proposed SDP model is assessed by comparing its performance against nine state-of-the-art SDP models using 15 imbalanced software defect datasets. Experimental results and the statistical analysis indicate that our proposed SDP model has better predictive performance than other test models in terms of recall, G-mean, F-measure and AUC.
引用
收藏
页数:21
相关论文
共 48 条
  • [1] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Goyal, Somya
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 2023 - 2064
  • [2] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Somya Goyal
    Artificial Intelligence Review, 2022, 55 : 2023 - 2064
  • [3] A New Under-Sampling Method to Face Class Overlap and Imbalance
    Guzman-Ponce, Angelica
    Valdovinos, Rosa Maria
    Sanchez, Jose Salvador
    Marcial-Romero, Jose Raymundo
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [4] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Ping Gong
    Junguang Gao
    Li Wang
    Journal of Systems Science and Systems Engineering, 2022, 31 : 728 - 752
  • [5] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Gong, Ping
    Gao, Junguang
    Wang, Li
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022, 31 (06) : 728 - 752
  • [6] Tackling class overlap and imbalance problems in software defect prediction
    Lin Chen
    Bin Fang
    Zhaowei Shang
    Yuanyan Tang
    Software Quality Journal, 2018, 26 : 97 - 125
  • [7] Tackling class overlap and imbalance problems in software defect prediction
    Chen, Lin
    Fang, Bin
    Shang, Zhaowei
    Tang, Yuanyan
    SOFTWARE QUALITY JOURNAL, 2018, 26 (01) : 97 - 125
  • [8] An ensemble model for addressing class imbalance and class overlap in software defect prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (12) : 5584 - 5603
  • [9] DBOS_US: a density-based graph under-sampling method to handle class imbalance and class overlap issues in software fault prediction
    Bhandari, Kirti
    Kumar, Kuldeep
    Sangal, Amrit Lal
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (15): : 22682 - 22725
  • [10] RFCL: A new under-sampling method of reducing the degree of imbalance and overlap
    Rui Zhang
    Zuoquan Zhang
    Di Wang
    Pattern Analysis and Applications, 2021, 24 : 641 - 654