A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

被引:16
|
作者
Khleel, Nasraldeen Alnor Adam [1 ]
Nehez, Karoly [1 ]
机构
[1] Univ Miskolc, Dept Informat Engn, H-3515 Miskolc, Hungary
关键词
Software defect prediction (SDP); Software metrics; Deep learning (DL); CNN; GRU; Class imbalance; Sampling techniques; SMOTE Tomek; NEURAL-NETWORK; RULES;
D O I
10.1007/s10844-023-00793-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to using historical defect data to construct a relationship between software metrics and defects via diverse methodologies. Several prediction models, such as machine learning (ML) and deep learning (DL), have been developed and adopted to recognize software module defects, and many methodologies and frameworks have been presented. Class imbalance is one of the most challenging problems these models face in binary classification. However, When the distribution of classes is imbalanced, the accuracy may be high, but the models cannot recognize data instances in the minority class, leading to weak classifications. So far, little research has been done in the previous studies that address the problem of class imbalance in SDP. In this study, the data sampling method is introduced to address the class imbalance problem and improve the performance of ML models in SDP. The proposed approach is based on a convolutional neural network (CNN) and gated recurrent unit (GRU) combined with a synthetic minority oversampling technique plus the Tomek link (SMOTE Tomek) to predict software defects. To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, F-measure, Matthew's correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), and mean square error (MSE). The experimental results showed that the proposed models predict the software defects more effectively on the balanced datasets than the original datasets, with an improvement of up to 19% for the CNN model and 24% for the GRU model in terms of AUC. We compared our proposed approach with existing SDP approaches based on several standard performance measures. The comparison results demonstrated that the proposed approach significantly outperforms existing state-of-the-art SDP approaches on most datasets.
引用
收藏
页码:673 / 707
页数:35
相关论文
共 50 条
  • [31] An Effective Rank Approach to Software Defect Prediction Using Software Metrics
    Lakshmi, P.
    Maheswari, Latha T.
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [32] An automatic algorithm for software vulnerability classification based on CNN and GRU
    Wang, Qian
    Li, Yazhou
    Wang, Yan
    Ren, Jiadong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (05) : 7103 - 7124
  • [33] An automatic algorithm for software vulnerability classification based on CNN and GRU
    Qian Wang
    Yazhou Li
    Yan Wang
    Jiadong Ren
    Multimedia Tools and Applications, 2022, 81 : 7103 - 7124
  • [34] A novel preprocessing approach for imbalanced learning in software defect prediction
    Bashir, Kamal
    Li, Tianrui
    Yohannese, Chubato Wondaferaw
    Yahaya, Mahama
    Ali, Tayseer
    DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 500 - 508
  • [35] Software Defect Prediction Method Based on Stable Learning
    Fan, Xi
    Mao, Jingen
    Lian, Liangjue
    Yu, Li
    Zheng, We
    Ge, Yun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 65 - 84
  • [36] Software Defect Prediction Method Based on Fuzzy Integral
    Liu, Wenying
    Chen, Chenxi
    Li, Kewen
    Wang, Peng
    Zhai, Jiannan
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2490 - 2493
  • [37] A Novel Ensemble Classifier Selection Method for Software Defect Prediction
    Dong, Xin
    Wang, Jie
    Liang, Yan
    IEEE ACCESS, 2025, 13 : 25578 - 25597
  • [38] A Novel Method for Software Defect Prediction in the Context of Big Data
    Chang, Ruihua
    Shen, Xiaowei
    Wang, Binghe
    Xu, Qiuping
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 105 - 109
  • [39] A Real-time Lithological Identification Method based on SMOTE-Tomek and ICSA Optimization
    Deng, Song
    Pan, Haoyu
    Li, Chaowei
    Yan, Xiaopeng
    Wang, Jiangshuai
    Shi, Lin
    Pei, Chunyu
    Cai, Meng
    ACTA GEOLOGICA SINICA-ENGLISH EDITION, 2024, 98 (02) : 518 - 530
  • [40] Novel CNN approach for video prediction based on FitVid
    Watanabe, Taiju
    Takahiro, Shindo
    Watanabe, Hiroshi
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2023, 2023, 12592