A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method

被引:16
|
作者
Khleel, Nasraldeen Alnor Adam [1 ]
Nehez, Karoly [1 ]
机构
[1] Univ Miskolc, Dept Informat Engn, H-3515 Miskolc, Hungary
关键词
Software defect prediction (SDP); Software metrics; Deep learning (DL); CNN; GRU; Class imbalance; Sampling techniques; SMOTE Tomek; NEURAL-NETWORK; RULES;
D O I
10.1007/s10844-023-00793-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through the ability to detect defective software components. SDP refers to using historical defect data to construct a relationship between software metrics and defects via diverse methodologies. Several prediction models, such as machine learning (ML) and deep learning (DL), have been developed and adopted to recognize software module defects, and many methodologies and frameworks have been presented. Class imbalance is one of the most challenging problems these models face in binary classification. However, When the distribution of classes is imbalanced, the accuracy may be high, but the models cannot recognize data instances in the minority class, leading to weak classifications. So far, little research has been done in the previous studies that address the problem of class imbalance in SDP. In this study, the data sampling method is introduced to address the class imbalance problem and improve the performance of ML models in SDP. The proposed approach is based on a convolutional neural network (CNN) and gated recurrent unit (GRU) combined with a synthetic minority oversampling technique plus the Tomek link (SMOTE Tomek) to predict software defects. To establish the efficiency of the proposed models, the experiments have been conducted on benchmark datasets obtained from the PROMISE repository. The experimental results have been compared and evaluated in terms of accuracy, precision, recall, F-measure, Matthew's correlation coefficient (MCC), the area under the ROC curve (AUC), the area under the precision-recall curve (AUCPR), and mean square error (MSE). The experimental results showed that the proposed models predict the software defects more effectively on the balanced datasets than the original datasets, with an improvement of up to 19% for the CNN model and 24% for the GRU model in terms of AUC. We compared our proposed approach with existing SDP approaches based on several standard performance measures. The comparison results demonstrated that the proposed approach significantly outperforms existing state-of-the-art SDP approaches on most datasets.
引用
收藏
页码:673 / 707
页数:35
相关论文
共 50 条
  • [1] A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method
    Nasraldeen Alnor Adam Khleel
    Károly Nehéz
    Journal of Intelligent Information Systems, 2023, 60 : 673 - 707
  • [2] Hybrid SMOTE-Ensemble Approach for Software Defect Prediction
    Alsawalqah, Hamad
    Faris, Hossam
    Aljarah, Ibrahim
    Alnemer, Loai
    Alhindawi, Nouh
    SOFTWARE ENGINEERING TRENDS AND TECHNIQUES IN INTELLIGENT SYSTEMS, CSOC2017, VOL 3, 2017, 575 : 355 - 366
  • [3] Software Defect Prediction Using SMOTE and Artificial Neural Network
    Dipa, Wisnu Arya
    Sunindyo, Wikan Danar
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE): DATA AND SOFTWARE ENGINEERING FOR SUPPORTING SUSTAINABLE DEVELOPMENT GOALS, 2021,
  • [4] Attention based GRU-LSTM for software defect prediction
    Munir, Hafiz Shahbaz
    Ren, Shengbing
    Mustafa, Mubashar
    Siddique, Chaudry Naeem
    Qayyum, Shazib
    PLOS ONE, 2021, 16 (03):
  • [5] SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction
    Balogun, Abdullateef O.
    Lafenwa-Balogun, Fatimah B.
    Mojeed, Hammed A.
    Adeyemo, Victor E.
    Akande, Oluwatobi N.
    Akintola, Abimbola G.
    Bajeh, Amos O.
    Usman-Hamza, Fatimah E.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT VI, 2020, 12254 : 615 - 631
  • [6] Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and Genetic Algorithm models
    Gupta, Mansi
    Rajnish, Kumar
    Bhattacharjee, Vandana
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 47627 - 47648
  • [7] Software fault prediction with imbalanced datasets using SMOTE-Tomek sampling technique and Genetic Algorithm models
    Mansi Gupta
    Kumar Rajnish
    Vandana Bhattacharjee
    Multimedia Tools and Applications, 2024, 83 : 47627 - 47648
  • [8] An Empirical Study on Software Defect Prediction Using Over-Sampling by SMOTE
    Pak, Cholmyong
    Wang, Tian Tian
    Su, Xiao Hong
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (06) : 811 - 830
  • [9] Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction
    Feng, Shuo
    Keung, Jacky
    Yu, Xiao
    Xiao, Yan
    Zhang, Miao
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139
  • [10] A novel approach for software defect prediction using fuzzy decision trees
    Marian, Zsuzsanna
    Mircea, Ioan-Gabriel
    Czibula, Istvan-Gergely
    Czibula, Gabriela
    PROCEEDINGS OF 2016 18TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 240 - 247