Robust Boundary-Enhanced GMM-SMOTE Software Defect Detection Method

被引:0
|
作者
Luo S. [1 ]
Su X. [1 ]
Pan L. [1 ]
机构
[1] School of Information and Electronics, Beijing Institute of Technology, Beijing
关键词
Data imbalance; Gaussian mixture model; Oversampling; Software defect detection;
D O I
10.15918/j.tbit1001-0645.2019.312
中图分类号
学科分类号
摘要
Software defects are bugs that can disrupt the normal operation of the system or software, the cost of detection and positioning for software defects is high. Automatic defect detection model based on software data have become an important tool for defect discovery. Defective samples that are accurately labeled is rare, and the rate of missing labels and mislabeling is high, which leads the existing data balance optimization methods to exacerbate noise and blur boundaries of classification. To solve this problem, a robust boundary-enhanced GMM-SMOTE software defect detection method was proposed. This method was arranged to use Gaussian mixture clustering to divide the software data set into multiple clusters, to make reliable sample selection based on intra-cluster category ratio, and to implement boundary recognition based on posterior probability, to guide the completion of the weighted data balance, and finally to build a software defect detection model using balanced optimization data. Experimental results on multiple NASA public data sets show that GMM-SMOTE can achieve data balance of noise suppression and boundary enhancement, effectively improve the effect of software defect detection, possessing great practical value. © 2021, Editorial Department of Transaction of Beijing Institute of Technology. All right reserved.
引用
收藏
页码:303 / 310
页数:7
相关论文
共 14 条
  • [1] WEI Shengjun, HE Tao, HU Changzhen, Et al., Predicting software security vulnerabilities with component dependency graphs, Transactions of Beijing Institute of Technology, 38, 5, pp. 525-530, (2018)
  • [2] HUDA S, ALYAHYA S, ALI M M, Et al., A framework for software defect prediction and metric selection, IEEE Access, 6, pp. 2844-2858, (2017)
  • [3] SHEPPERD M, BOWES D, HALL T., Researcher bias: the use of machine learning in software defect prediction, IEEE Transactions on Software Engineering, 40, 6, pp. 603-616, (2014)
  • [4] ALI A, SHAMSUDDIN S M, RALESCU A L., Classification with class imbalance problem: a review, Int.J.Advance Soft Compu.Appl, 7, 3, pp. 176-204, (2015)
  • [5] NAUFAL M F, KUSUMA S F., Software defect detection based on selected complexity metrics using fuzzy association rule mining and defective module oversampling, 16th International Joint Conference on Computer Science and Software Engineering(JCSSE), pp. 330-335, (2019)
  • [6] BENNIN K E, KEUNG J, PHANNACHITTA P, Et al., Mahakil: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction, IEEE Transactions on Software Engineering, 44, 6, pp. 534-550, (2017)
  • [7] HUDA S, LIU K, ABDELRAZEK M, Et al., An ensemble oversampling model for class imbalance problem in software defect prediction, IEEE Access, 6, pp. 24184-24195, (2018)
  • [8] KOVACS G., An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets, Applied Soft Computing, 83, (2019)
  • [9] CHAWLA N V, BOWYER K W, HALL L O, Et al., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 1, pp. 321-357, (2002)
  • [10] DOUZAS G, BACAO F, LAST F., Improving imbalanced learning through a heuristic oversampling method based on K-means and SMOTE, Information Sciences, 465, pp. 1-20, (2018)