ROCT: Radius-based Class Overlap Cleaning Technique to Alleviate the Class Overlap Problem in Software Defect Prediction

被引:5
|
作者
Feng, Shuo [1 ]
Keung, Jacky [1 ]
Liu, Jie [2 ]
Xiao, Yan [3 ]
Yu, Xiao [4 ]
Zhang, Miao [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[3] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[4] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
关键词
Class Overlap; Class Imbalance; Data Preprocessing; Software Defect Prediction; EFFECT SIZE;
D O I
10.1109/COMPSAC51774.2021.00041
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The training data commonly used in software defect prediction (SDP) usually contains some instances that have similar values on features but are in different classes, which significantly degrades the performance of prediction models trained using these instances. This is referred to as the class overlap problem (COP). Previous studies have concluded that COP has a more negative impact on the performance of prediction models than the class imbalance problem (CIP). However, less research has been conducted on COP than CIP. Moreover, the performance of the existing class overlap cleaning techniques heavily relies on the settings of hyperparameters such as the value of K in the K-nearest neighbor algorithm or the K-means algorithm, but how to find those optimal hyperparameters is still a challenge. In this study, we propose a novel technique named the radius-based class overlap cleaning technique (ROCT) to better alleviate COP without tuning hyperparameters in SDP. The basic idea of ROCT is to take each instance as the center of a hypersphere and directly optimize the radius of the hypersphere. Then ROCT identifies those instances with the opposite label of the center instance as the overlapping instance and removes them. To investigate the performance of ROCT, we conduct the empirical experiment across 29 datasets collected from various software repositories on the K-nearest neighbor, random forest, logistic regression, and naive Bayes classifiers measured by AUC, balance, pd, and pf. The experimental results show that ROCT performs the best and significantly improves the performance of prediction models by as much as 15.2% and 29.9% in terms of AUC and balance compared with the existing class overlap cleaning techniques. The superior performance of ROCT indicates that ROCT should be recommended as an efficient alternative to alleviate COP in SDP.
引用
收藏
页码:228 / 237
页数:10
相关论文
共 33 条
  • [21] An Empirical Study on Data Sampling Methods in Addressing Class Imbalance Problem in Software Defect Prediction
    Odejide, Babajide J.
    Bajeh, Amos O.
    Balogun, Abdullateef O.
    Alanamu, Zubair O.
    Adewole, Kayode S.
    Akintola, Abimbola G.
    Salihu, Shakirat A.
    Usman-Hamza, Fatima E.
    Mojeed, Hammed A.
    SOFTWARE ENGINEERING PERSPECTIVES IN SYSTEMS, VOL. 1, 2022, 501 : 594 - 610
  • [22] Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering
    Gong, Lina
    Jiang, Shujuan
    Jiang, Li
    IEEE ACCESS, 2019, 7 : 145725 - 145737
  • [23] A novel software defect prediction based on atomic class-association rule mining
    Shao, Yuanxun
    Liu, Bin
    Wang, Shihai
    Li, Guoqi
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 237 - 254
  • [24] IFCM: An improved Fuzzy C-means clustering method to handle Class Overlap on Aging-related Software Bug Prediction
    Zhang, Chen
    Feng, Shuo
    Xie, Wenzhi
    Zhao, Dongdong
    Xiang, Jianwen
    Pietrantuono, Roberto
    Natella, Roberto
    Cotroneo, Domenico
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 590 - 600
  • [25] Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem
    Siers, Michael J.
    Islam, Md Zahidul
    INFORMATION SYSTEMS, 2015, 51 : 62 - 71
  • [26] Proficient 3-class classification model for confident overlap value based fuzzified aquatic information extracted tsunami prediction
    Jain, Nikita
    Virmani, Deepali
    Abraham, Ajith
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (03): : 295 - 303
  • [27] Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction
    C. Arun
    C. Lakshmi
    Soft Computing, 2022, 26 : 12915 - 12931
  • [28] Genetic algorithm-based oversampling approach to prune the class imbalance issue in software defect prediction
    Arun, C.
    Lakshmi, C.
    SOFT COMPUTING, 2022, 26 (23) : 12915 - 12931
  • [29] Software bug priority prediction technique based on intuitionistic fuzzy representation and class imbalance learning
    Panda, Rama Ranjan
    Nagwani, Naresh Kumar
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2135 - 2164
  • [30] Software bug priority prediction technique based on intuitionistic fuzzy representation and class imbalance learning
    Rama Ranjan Panda
    Naresh Kumar Nagwani
    Knowledge and Information Systems, 2024, 66 : 2135 - 2164