Software Defect Prediction Method Based on Clustering Ensemble Learning

被引:0
|
作者
Tao, Hongwei [1 ]
Cao, Qiaoling [1 ]
Chen, Haoran [1 ]
Li, Yanting [1 ]
Niu, Xiaoxu [1 ]
Wang, Tao [1 ]
Geng, Zhenhao [1 ]
Shang, Songtao [1 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
clustering ensemble learning; feature selection; software defect prediction; FEATURE-SELECTION; QUALITY;
D O I
10.1049/2024/6294422
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset-Newman-Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Model-based Software Defect Prediction from Software Quality Characterized Code Features by using Stacking Ensemble Learning
    Kumar P.S.
    Nayak J.
    Behera H.S.
    Journal of Engineering Science and Technology Review, 2022, 15 (02) : 137 - 155
  • [42] A software defect prediction method with metric compensation based on feature selection and transfer learning
    Chen, Jinfu
    Wang, Xiaoli
    Cai, Saihua
    Xu, Jiaping
    Chen, Jingyi
    Chen, Haibo
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (05) : 715 - 731
  • [43] Software Defect Prediction Method Based on Fuzzy Integral
    Liu, Wenying
    Chen, Chenxi
    Li, Kewen
    Wang, Peng
    Zhai, Jiannan
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2490 - 2493
  • [44] Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning
    Ali, Misbah
    Mazhar, Tehseen
    Al-Rasheed, Amal
    Shahzad, Tariq
    Ghadi, Yazeed Yasin
    Khan, Muhammad Amir
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [45] ENSEMBLE OF SOFTWARE DEFECT PREDICTORS: AN AHP-BASED EVALUATION METHOD
    Peng, Yi
    Kou, Gang
    Wang, Guoxun
    Wu, Wenshuai
    Shi, Yong
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2011, 10 (01) : 187 - 206
  • [46] Heterogeneous stacked ensemble classifier for software defect prediction
    Goyal, Somya
    Bhatia, Pradeep Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 37033 - 37055
  • [47] Software Defect Prediction Approach Based on a Diversity Ensemble Combined With Neural Network
    Chen, Jinfu
    Xu, Jiaping
    Cai, Saihua
    Wang, Xiaoli
    Chen, Haibo
    Li, Zhehao
    IEEE TRANSACTIONS ON RELIABILITY, 2024, 73 (03) : 1487 - 1501
  • [48] Heterogeneous stacked ensemble classifier for software defect prediction
    Somya Goyal
    Pradeep Kumar Bhatia
    Multimedia Tools and Applications, 2022, 81 : 37033 - 37055
  • [49] Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns
    Alsawalqah, Hamad
    Hijazi, Neveen
    Eshtay, Mohammed
    Faris, Hossam
    Al Radaideh, Ahmed
    Aljarah, Ibrahim
    Alshamaileh, Yazan
    APPLIED SCIENCES-BASEL, 2020, 10 (05):
  • [50] Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data
    He, Haitao
    Zhang, Xu
    Wang, Qian
    Ren, Jiadong
    Liu, Jiaxin
    Zhao, Xiaolin
    Cheng, Yongqiang
    IEEE ACCESS, 2019, 7 : 110333 - 110343