Software Defect Prediction Method Based on Clustering Ensemble Learning

被引:0
|
作者
Tao, Hongwei [1 ]
Cao, Qiaoling [1 ]
Chen, Haoran [1 ]
Li, Yanting [1 ]
Niu, Xiaoxu [1 ]
Wang, Tao [1 ]
Geng, Zhenhao [1 ]
Shang, Songtao [1 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
clustering ensemble learning; feature selection; software defect prediction; FEATURE-SELECTION; QUALITY;
D O I
10.1049/2024/6294422
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large-scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi-squared sparse feature selection method is proposed. This feature selection strategy combines chi-squared tests with sparse principal component analysis (SPCA). Specifically, the chi-squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset-Newman-Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi-squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm
    Yu Tang
    Qi Dai
    Mengyuan Yang
    Tony Du
    Lifang Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1967 - 1987
  • [22] Software Defect Prediction Based on Fourier Learning
    Yang, Kang
    Yu, Huiqun
    Fan, Guisheng
    Yang, Xingguang
    Zheng, Song
    Leng, Chunxia
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2018, : 388 - 392
  • [23] Deep learning based software defect prediction
    Qiao, Lei
    Li, Xuesong
    Umer, Qasim
    Guo, Ping
    NEUROCOMPUTING, 2020, 385 : 100 - 110
  • [24] Dictionary Learning Based Software Defect Prediction
    Jing, Xiao-Yuan
    Ying, Shi
    Zhang, Zhi-Wu
    Wu, Shan-Shan
    Liu, Jin
    36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 414 - 423
  • [25] An efficient dual ensemble software defect prediction method with neural network
    Chen, Jinfu
    Xu, Jiaping
    Cai, Saihua
    Wang, Xiaoli
    Gu, Yuechao
    Wang, Shuhui
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2021), 2021, : 91 - 98
  • [26] A Package Based Clustering for Enhancing Software Defect Prediction Accuracy
    Islam, Rayhanul
    Sakib, Kazi
    2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 81 - 86
  • [27] Prediction of Traffic Incident Duration Using Clustering-Based Ensemble Learning Method
    Zhao, Hui
    Gunardi, Willy
    Liu, Yang
    Kiew, Christabel
    Teng, Teck-Hou
    Yang, Xiao Bo
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2022, 148 (07)
  • [28] A random approximate reduct-based ensemble learning approach and its application in software defect prediction
    Jiang, Feng
    Yu, Xu
    Gong, Dunwei
    Du, Junwei
    Information Sciences, 2022, 609 : 1147 - 1168
  • [29] A random approximate reduct-based ensemble learning approach and its application in software defect prediction
    Jiang, Feng
    Yu, Xu
    Gong, Dunwei
    Du, Junwei
    INFORMATION SCIENCES, 2022, 609 : 1147 - 1168
  • [30] Neighborhood Approximate Reducts-Based Ensemble Learning Algorithm and Its Application in Software Defect Prediction
    Yang, Zhiyong
    Du, Junwei
    Hu, Qiang
    Jiang, Feng
    ROUGH SETS, IJCRS 2022, 2022, 13633 : 100 - 113