Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction

被引:1
|
作者
Dar, Abdul Waheed [1 ]
Farooq, Sheikh Umar [1 ]
机构
[1] Univ Kashmir, Dept Comp Sci, North Campus, Srinagar, India
关键词
Class imbalance problem; Machine learning; Software defect prediction; Over-sampling; Under-sampling; PERFORMANCE; MACHINE; SMOTE;
D O I
10.1007/s11334-024-00571-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Various techniques in machine learning have been used for building software defect prediction (SDP) models to identify the defective software modules. However, a major challenge to SDP models is the class overlapping and the class imbalance problem of SDP datasets. This study proposes a new SDP model that combines the overlap-based under-sampling framework with the balanced random forest classifier to improve the identification of defective software modules. First, the duplicate instances of the dataset are removed to avoid the over-fitting of the model. Next, the overlapped majority non-defective class instances of the training data are removed by applying an overlap-based under-sampling technique to maximize the presence of minority defective class instances in a region where the two classes overlap. Finally, we use the balanced random forest, which combines the random under-sampling and the ensemble learning techniques on the pre-processed training data for achieving the goal of classification prediction. The efficacy of our proposed SDP model is assessed by comparing its performance against nine state-of-the-art SDP models using 15 imbalanced software defect datasets. Experimental results and the statistical analysis indicate that our proposed SDP model has better predictive performance than other test models in terms of recall, G-mean, F-measure and AUC.
引用
收藏
页数:21
相关论文
共 48 条
  • [21] A novel framework for class imbalance learning using intelligent under-sampling
    Naganjaneyulu S.
    Kuppa M.R.
    Naganjaneyulu, S. (svna2198@gmail.com), 1600, Springer Verlag (02): : 73 - 84
  • [22] Using Class Imbalance Learning for Software Defect Prediction
    Wang, Shuo
    Yao, Xin
    IEEE TRANSACTIONS ON RELIABILITY, 2013, 62 (02) : 434 - 443
  • [23] Software Defect Prediction Using Random Forest Algorithm
    Soe, Yan Naung
    Santosa, Paulus Insap
    Hartanto, Rudy
    2018 12TH SOUTH EAST ASIAN TECHNICAL UNIVERSITY CONSORTIUM (SYMPOSIUM SEATUC 2018): ENGINEERING EDUCATION AND RESEARCH FOR SUSTAINABLE DEVELOPMENT, 2018,
  • [24] Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction
    Malhotra, Ruchika
    Agrawal, Vaibhav
    Pal, Vedansh
    Agarwal, Tushar
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 1078 - 1083
  • [25] A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling
    Wang, Renliang
    Liu, Feng
    Bai, Yanhui
    ELECTRONICS, 2024, 13 (20)
  • [26] A Novel Hybrid Sampling Method ESMOTE plus SSLM for Handling the Problem of Class Imbalance with Overlap in Financial Distress Detection
    Wang, Xiaomin
    Zhang, Rui
    Zhang, Zuoquan
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3081 - 3105
  • [27] A Novel Hybrid Sampling Method ESMOTE+SSLM for Handling the Problem of Class Imbalance with Overlap in Financial Distress Detection
    Xiaomin Wang
    Rui Zhang
    Zuoquan Zhang
    Neural Processing Letters, 2023, 55 : 3081 - 3105
  • [28] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [29] Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem
    Siers, Michael J.
    Islam, Md Zahidul
    INFORMATION SYSTEMS, 2015, 51 : 62 - 71
  • [30] Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
    Sikora, Riyaz
    Lee, Yoon Sang
    INFORMATION SYSTEMS FRONTIERS, 2024,