A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

被引:16
|
作者
Khleel, Nasraldeen Alnor Adam [1 ]
Nehez, Karoly [1 ]
机构
[1] Univ Miskolc, Dept Informat Engn, H-3515 Miskolc, Hungary
关键词
Software defect prediction (SDP); Software metrics; Deep learning (DL); CNN; GRU; Class imbalance; Sampling techniques; SMOTE Tomek; NEURAL-NETWORK; RULES;
D O I
10.1007/s10844-023-00793-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to using historical defect data to construct a relationship between software metrics and defects via diverse methodologies. Several prediction models, such as machine learning (ML) and deep learning (DL), have been developed and adopted to recognize software module defects, and many methodologies and frameworks have been presented. Class imbalance is one of the most challenging problems these models face in binary classification. However, When the distribution of classes is imbalanced, the accuracy may be high, but the models cannot recognize data instances in the minority class, leading to weak classifications. So far, little research has been done in the previous studies that address the problem of class imbalance in SDP. In this study, the data sampling method is introduced to address the class imbalance problem and improve the performance of ML models in SDP. The proposed approach is based on a convolutional neural network (CNN) and gated recurrent unit (GRU) combined with a synthetic minority oversampling technique plus the Tomek link (SMOTE Tomek) to predict software defects. To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, F-measure, Matthew's correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), and mean square error (MSE). The experimental results showed that the proposed models predict the software defects more effectively on the balanced datasets than the original datasets, with an improvement of up to 19% for the CNN model and 24% for the GRU model in terms of AUC. We compared our proposed approach with existing SDP approaches based on several standard performance measures. The comparison results demonstrated that the proposed approach significantly outperforms existing state-of-the-art SDP approaches on most datasets.
引用
收藏
页码:673 / 707
页数:35
相关论文
共 50 条
  • [21] Research on insulator defect acoustic signal recognition method based on CNN-GRU
    Zhong, Dantian
    Sun, Yuchi
    Gao, Jing
    Na, Zheng
    Gao, Yang
    Zhao, Jiangyuan
    AIP ADVANCES, 2024, 14 (11)
  • [22] Methanol price prediction method based on multimodal fusion by using CNN-GRU and attention mechanism
    Luo, Shuang
    Zhu, Xuhui
    Ni, Zhiwei
    Xia, Pingfan
    Ni, Liping
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2025, 25 (01)
  • [23] Informative Software Defect Data Generation and Prediction: INF-SMOTE
    Rekha, G.
    Shailaja, K.
    Jatoth, Chandrashekar
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT I, 2022, 1613 : 179 - 191
  • [24] Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis
    Fan, Xin
    Zhang, Shuqing
    Wu, Kaisheng
    Zheng, Wei
    Ge, Yu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (02): : 1687 - 1711
  • [25] Water Level Prediction Model Based on GRU and CNN
    Pan, Mingyang
    Zhou, Hainan
    Cao, Jiayi
    Liu, Yisai
    Hao, Jiangling
    Li, Shaoxi
    Chen, Chi-Hua
    IEEE ACCESS, 2020, 8 (08): : 60090 - 60100
  • [26] Deep neural network based hybrid approach for software defect prediction using software metrics
    Manjula, C.
    Florence, Lilly
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 4): : S9847 - S9863
  • [27] Deep neural network based hybrid approach for software defect prediction using software metrics
    C. Manjula
    Lilly Florence
    Cluster Computing, 2019, 22 : 9847 - 9863
  • [28] Software defect prediction using learning to rank approach
    Nassif, Ali Bou
    Talib, Manar Abu
    Azzeh, Mohammad
    Alzaabi, Shaikha
    Khanfar, Rawan
    Kharsa, Ruba
    Angelis, Lefteris
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [29] Software defect prediction using learning to rank approach
    Ali Bou Nassif
    Manar Abu Talib
    Mohammad Azzeh
    Shaikha Alzaabi
    Rawan Khanfar
    Ruba Kharsa
    Lefteris Angelis
    Scientific Reports, 13
  • [30] A novel deep learning-based 1D-CNN-optimized GRU approach for heart disease prediction
    G., Jini Mol
    Raj, T. Ajith Bosco
    AUTOMATIKA, 2025, 66 (01) : 79 - 90