Informative Software Defect Data Generation and Prediction: INF-SMOTE

被引:0
|
作者
Rekha, G. [1 ]
Shailaja, K. [2 ]
Jatoth, Chandrashekar [3 ]
机构
[1] Koneru Lakshmaiah Educ Fdn, Dept Comp Sci & Engn, Hyderabad, India
[2] Vasavi Coll Engn, Dept Comp Sci & Engn, Hyderabad, India
[3] Natl Inst Technol, Dept Comp Sci & Engn, Raipur, India
来源
ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT I | 2022年 / 1613卷
关键词
Class imbalance problem; Imbalanced classification; Imbalanced datasets; Over-sampling; SMOTE;
D O I
10.1007/978-3-031-12638-3_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Highly imbalanced data typically make accurate predictions difficult. Unfortunately, software defect datasets tend to have fewer defective modules than non-defective modules. Synthetic oversampling approaches, namely SMOTE, address this concern by creating new minority defective modules to balance the class distribution before a model is trained. Despite its success, these approaches come with the following shortcomings such as 1) over-generalization problem and generate near-duplicated data instances (less diverse data) due to oversampling of noisy samples, and 2) increasing the overlaps between different classes around the class boundaries. This paper introduces INF-SMOTE (Informative- Synthetic Minority Oversampling Technique), a novel and efficient synthetic oversampling approach for software defect datasets, simultaneously targeting all the shortcomings. INF-SMOTE identifies the informative minority samples that are appropriate for over-sampling. The process is in two way 1.) it identify and remove the noisy and overlapping samples from borderline minority instances based on the sampling seeds, and 2) synthetic samples are generated from the informative minority samples. Experiments were conducted on 12 releases of SDP (Software Defect Prediction) Datasets from the NASA repository. By comparing with the state-of-the-art techniques, we observe that the INF-SMOTE improves the defect prediction performance.
引用
收藏
页码:179 / 191
页数:13
相关论文
共 50 条
  • [1] Software Defect Prediction Using SMOTE and Artificial Neural Network
    Dipa, Wisnu Arya
    Sunindyo, Wikan Danar
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE): DATA AND SOFTWARE ENGINEERING FOR SUPPORTING SUSTAINABLE DEVELOPMENT GOALS, 2021,
  • [2] Hybrid SMOTE-Ensemble Approach for Software Defect Prediction
    Alsawalqah, Hamad
    Faris, Hossam
    Aljarah, Ibrahim
    Alnemer, Loai
    Alhindawi, Nouh
    SOFTWARE ENGINEERING TRENDS AND TECHNIQUES IN INTELLIGENT SYSTEMS, CSOC2017, VOL 3, 2017, 575 : 355 - 366
  • [3] Class Imbalance Data-Generation for Software Defect Prediction
    Li, Zheng
    Zhang, Xingyao
    Guo, Junxia
    Shang, Ying
    2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, : 276 - 283
  • [4] SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction
    Balogun, Abdullateef O.
    Lafenwa-Balogun, Fatimah B.
    Mojeed, Hammed A.
    Adeyemo, Victor E.
    Akande, Oluwatobi N.
    Akintola, Abimbola G.
    Bajeh, Amos O.
    Usman-Hamza, Fatimah E.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT VI, 2020, 12254 : 615 - 631
  • [5] Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction
    Feng, Shuo
    Keung, Jacky
    Yu, Xiao
    Xiao, Yan
    Zhang, Miao
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139
  • [6] An Empirical Study on Software Defect Prediction Using Over-Sampling by SMOTE
    Pak, Cholmyong
    Wang, Tian Tian
    Su, Xiao Hong
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (06) : 811 - 830
  • [7] Software Defect Prediction with Skewed Data
    Seliya, Naeem
    Khoshgoftaar, Taghi M.
    16TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, 2010, : 403 - +
  • [8] The impact of the distance metric and measure on SMOTE-based techniques in software defect prediction
    Feng, Shuo
    Keung, Jacky
    Zhang, Peichang
    Xiao, Yan
    Zhang, Miao
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 142
  • [9] A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method
    Khleel, Nasraldeen Alnor Adam
    Nehez, Karoly
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (03) : 673 - 707
  • [10] Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis
    Fan, Xin
    Zhang, Shuqing
    Wu, Kaisheng
    Zheng, Wei
    Ge, Yu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (02): : 1687 - 1711