A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

被引:16
|
作者
Khleel, Nasraldeen Alnor Adam [1 ]
Nehez, Karoly [1 ]
机构
[1] Univ Miskolc, Dept Informat Engn, H-3515 Miskolc, Hungary
关键词
Software defect prediction (SDP); Software metrics; Deep learning (DL); CNN; GRU; Class imbalance; Sampling techniques; SMOTE Tomek; NEURAL-NETWORK; RULES;
D O I
10.1007/s10844-023-00793-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to using historical defect data to construct a relationship between software metrics and defects via diverse methodologies. Several prediction models, such as machine learning (ML) and deep learning (DL), have been developed and adopted to recognize software module defects, and many methodologies and frameworks have been presented. Class imbalance is one of the most challenging problems these models face in binary classification. However, When the distribution of classes is imbalanced, the accuracy may be high, but the models cannot recognize data instances in the minority class, leading to weak classifications. So far, little research has been done in the previous studies that address the problem of class imbalance in SDP. In this study, the data sampling method is introduced to address the class imbalance problem and improve the performance of ML models in SDP. The proposed approach is based on a convolutional neural network (CNN) and gated recurrent unit (GRU) combined with a synthetic minority oversampling technique plus the Tomek link (SMOTE Tomek) to predict software defects. To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, F-measure, Matthew's correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), and mean square error (MSE). The experimental results showed that the proposed models predict the software defects more effectively on the balanced datasets than the original datasets, with an improvement of up to 19% for the CNN model and 24% for the GRU model in terms of AUC. We compared our proposed approach with existing SDP approaches based on several standard performance measures. The comparison results demonstrated that the proposed approach significantly outperforms existing state-of-the-art SDP approaches on most datasets.
引用
收藏
页码:673 / 707
页数:35
相关论文
共 50 条
  • [41] A Real-time Lithological Identification Method based on SMOTE-Tomek and ICSA Optimization
    DENG Song
    PAN Haoyu
    LI Chaowei
    YAN Xiaopeng
    WANG Jiangshuai
    SHI Lin
    PEI Chunyu
    CAI Meng
    Acta Geologica Sinica(English Edition), 2024, 98 (02) : 518 - 530
  • [42] A NOVEL MODEL FOR STOCK CLOSING PRICE PREDICTION USING CNN-ATTENTION-GRU-ATTENTION
    Lu, Wenjie
    Li, Jiazheng
    Wang, Jingyang
    Wu, Shaowen
    ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2022, 56 (03): : 251 - 264
  • [43] A Novel Cryptocurrency Prediction Method Using Optimum CNN
    Hasan, Syed H.
    Hasan, Syeda Huyam
    Ahmed, Mohammed Salih
    Hasan, Syed Hamid
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1051 - 1063
  • [44] Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE
    Salima Smiti
    Makram Soui
    Information Systems Frontiers, 2020, 22 : 1067 - 1083
  • [45] Trajectory Prediction and Intention Recognition Based on CNN-GRU
    Du, Jinghao
    Lu, Dongdong
    Li, Fei
    Liu, Ke
    Qiu, Xiaolan
    IEEE ACCESS, 2025, 13 : 26945 - 26957
  • [46] A novel CNN-GRU-LSTM based deep learning model for accurate traffic prediction
    Vandana Singh
    Sudip Kumar Sahana
    Vandana Bhattacharjee
    Discover Computing, 28 (1)
  • [47] A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data
    Ning, Qiao
    Zhao, Xiaowei
    Ma, Zhiqiang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (05) : 2632 - 2641
  • [48] Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE
    Smiti, Salima
    Soui, Makram
    INFORMATION SYSTEMS FRONTIERS, 2020, 22 (05) : 1067 - 1083
  • [49] Charging load prediction method for electric vehicles based on an ISSA-CNN-GRU model
    Yao F.
    Tang J.
    Chen S.
    Dong X.
    Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2023, 51 (16): : 158 - 167
  • [50] A Multichannel-Based CNN and GRU Method for Short-Term Wind Power Prediction
    Gao, Jian
    Ye, Xi
    Lei, Xia
    Huang, Bohao
    Wang, Xi
    Wang, Lili
    ELECTRONICS, 2023, 12 (21)