Software Defect Prediction Method Based on Clustering Ensemble Learning

被引:0
|
作者
Tao, Hongwei [1 ]
Cao, Qiaoling [1 ]
Chen, Haoran [1 ]
Li, Yanting [1 ]
Niu, Xiaoxu [1 ]
Wang, Tao [1 ]
Geng, Zhenhao [1 ]
Shang, Songtao [1 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
clustering ensemble learning; feature selection; software defect prediction; FEATURE-SELECTION; QUALITY;
D O I
10.1049/2024/6294422
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset-Newman-Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction
    Srivastava R.
    Jain A.K.
    Recent Advances in Computer Science and Communications, 2022, 15 (06): : 868 - 882
  • [2] Ensemble learning based software defect prediction
    Dong, Xin
    Liang, Yan
    Miyamoto, Shoichiro
    Yamaguchi, Shingo
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (04): : 377 - 391
  • [3] Software Defect Prediction Method Based on Stable Learning
    Fan, Xi
    Mao, Jingen
    Lian, Liangjue
    Yu, Li
    Zheng, We
    Ge, Yun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 65 - 84
  • [4] Multiple kernel ensemble learning for software defect prediction
    Tiejian Wang
    Zhiwu Zhang
    Xiaoyuan Jing
    Liqiang Zhang
    Automated Software Engineering, 2016, 23 : 569 - 590
  • [5] Multiple kernel ensemble learning for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Zhang, Liqiang
    AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (04) : 569 - 590
  • [6] Software Defect Prediction Based Ensemble Approach
    Harikiran J.
    Chandana B.S.
    Srinivasarao B.
    Raviteja B.
    Reddy T.S.
    Computer Systems Science and Engineering, 2023, 45 (03): : 2313 - 2331
  • [7] Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning
    Zhang, Tianhang
    Du, Qingfeng
    Xu, Jincheng
    Li, Jiechu
    Li, Xiaojun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 81 - 90
  • [8] Prediction Algorithm for Software Defect Series Based on Nonlinear Weighted Ensemble Learning
    Jia X.
    Fan S.
    Luo X.
    Zhu X.
    1600, Xi'an Jiaotong University (51): : 156 - 161
  • [9] Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1806 - 1817
  • [10] Software defect prediction using ensemble learning on selected features
    Laradji, Issam H.
    Alshayeb, Mohammad
    Ghouti, Lahouari
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 58 : 388 - 402