A Noise Tolerable Feature Selection Framework for Software Defect Prediction

被引:0
|
作者
Liu W.-S. [1 ,2 ]
Chen X. [1 ,3 ]
Gu Q. [1 ,2 ]
Liu S.-L. [1 ,2 ]
Chen D.-X. [1 ,2 ]
机构
[1] State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
[2] Department of Computer Science and Technology, Nanjing University, Nanjing
[3] School of Computer Science and Technology, Nantong University, Nantong, 226019, Jiangsu
来源
基金
中国国家自然科学基金;
关键词
Cluster analysis; Feature selection; Noise tolerable ability; Software defect prediction; Software quality assurance;
D O I
10.11897/SP.J.1016.2018.00506
中图分类号
学科分类号
摘要
Software defect prediction constructs a software defect prediction model based on the mining of software historical repositories. Then it uses the trained model to predict potential defect-proneness program modules. However noises are inevitable when labeling or measuring the software entities. Although some researchers have investigated the noise tolerance of existing feature selection methods, few studies focus on proposing new feature selection methods with a certain noise tolerance. To solve this issue, we propose a novel framework FECS(FEature Clustering with Selection strategies). In particular, FECS first cluster original features into specified number of clusters based on cluster analysis. Then it selects a most typical feature from each cluster based on our proposed three heuristic feature selection strategies. During empirical studies, we choose real-world software projects, such as Eclipse and NASA. We first perform a set of data preprocessing steps to improve the quality of these datasets. We then inject class level and feature level noises simultaneously to imitate noisy datasets. After using classical feature selection methods as the baseline, we confirm the effectiveness of FECS and provide a guideline of using FECS after analyzing the effects of varying either percentage of selected features or the noise injection rates, and different noise types. © 2018, Science Press. All right reserved.
引用
收藏
页码:506 / 520
页数:14
相关论文
共 37 条
  • [1] Wang Q., Wu S.-J., Li M.-S., Software defect prediction, Journal of Software, 19, 7, pp. 1565-1580, (2008)
  • [2] Hall T., Beecham S., Bowes D., Et al., A systematic literature review on fault prediction performance in software engineering, IEEE Transactions on Software Engineering, 38, 6, pp. 1276-1304, (2012)
  • [3] Chen X., Gu Q., Liu W.-S., Et al., Software defect prediction, Journal of Software, 27, 1, pp. 1-25, (2016)
  • [4] Yu S.-S., Zhou S.-G., Guan J.-H., Software engineering data mining: A survey, Journal of Frontiers of Computer Science and Technology, 6, 1, pp. 1-31, (2012)
  • [5] Radjenovic D., Hericko M., Torkar R., Zivkovic A., Software fault prediction metrics: A systematic literature review, Information and Software Technology, 55, 8, pp. 1397-1418, (2013)
  • [6] Kim S., Zhang H.Y., Wu R.X., Gong L., Dealing with noise in defect prediction, Proceedings of the International Conference on Software Engineering, pp. 481-490, (2011)
  • [7] Tantithamthavorn C., McIntosh S., Hassan A.E., Et al., The impact of mislabeling on the performance and interpretation of defect prediction models, Proceedings of the International Conference on Software Engineering, pp. 812-823, (2015)
  • [8] Liu S.L., Chen X., Liu W.S., Et al., FECAR: A feature selection framework for software defect prediction, Proceedings of the Annual Computer Software and Applications Conference, pp. 426-435, (2014)
  • [9] Menzies T., Greenwald J., Frank A., Data mining static code attributes to learn defect predictors, IEEE Transactions on Software Engineering, 32, 11, pp. 1-12, (2007)
  • [10] Song Q.B., Jia Z.H., Shepperd M., Et al., A general software defect-proneness prediction framework, IEEE Transactions on Software Engineering, 37, 3, pp. 356-370, (2011)