CLAMI: Defect Prediction on Unlabeled Datasets

被引：140

作者：

Nam, Jaechang ^{[1
]}

Kim, Sunghun ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China

来源：

2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) | 2015年

关键词：

STATIC CODE ATTRIBUTES; SOFTWARE; FAULTS; SELECTION; METRICS;

D O I：

10.1109/ASE.2015.56

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Defect prediction on new projects or projects with limited historical data is an interesting problem in software engineering. This is largely because it is difficult to collect defect information to label a dataset for training a prediction model. Cross-project defect prediction (CPDP) has tried to address this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP does not always build a strong prediction model because of the different distributions among datasets. Approaches for defect prediction on unlabeled datasets have also tried to address the problem by adopting unsupervised learning but it has one major limitation, the necessity for manual effort. In this study, we propose novel approaches, CLA and CLAMI, that show the potential for defect prediction on unlabeled datasets in an automated manner without need for manual effort. The key idea of the CLA and CLAMI approaches is to label an unlabeled dataset by using the magnitude of metric values. In our empirical study on seven open-source projects, the CLAMI approach led to the promising prediction performances, 0.636 and 0.723 in average f-measure and AUC, that are comparable to those of defect prediction based on supervised learning.

引用

页码：452 / 463

页数：12

共 50 条

[1] Defect Prediction on Unlabeled Datasets by Using Unsupervised Clustering
Yang, Jun
Qian, Hongbing
PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 465 - 472
[2] Snoring: a Noise in Defect Prediction Datasets
Ahluwalia, Aalok
Falessi, Davide
Di Penta, Massimiliano
2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2019), 2019, : 63 - 67
[3] A Study of Redundant Metrics in Defect Prediction Datasets
Jiarpakdee, Jirayus
Tantithamthavorn, Chakkrit
Ihara, Akinori
Matsumoto, Kenichi
2016 IEEE 27TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2016, : 51 - 52
[4] Software Defect Prediction on Unlabelled Datasets: A Comparative Study
Ronchieri, Elisabetta
Canaparo, Marco
Belgiovine, Mauro
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT II, 2020, 12250 : 333 - 353
[5] Improving Software Defect Prediction in Noisy Imbalanced Datasets
Shi, Haoxiang
Ai, Jun
Liu, Jingyu
Xu, Jiaxi
APPLIED SCIENCES-BASEL, 2023, 13 (18):
[6] Inheritance metrics feats in unsupervised learning to classify unlabeled datasets and clusters in fault prediction
Aziz, Syed Rashid
Khan, Tamim Ahmed
Nadeem, Aamer
PEERJ COMPUTER SCIENCE, 2021, 7
[7] Inheritance metrics feats in unsupervised learning to classify unlabeled datasets and clusters in fault prediction
Aziz S.R.
Khan T.A.
Nadeem A.
PeerJ Computer Science, 2021, 7
[8] Automatic Evaluation of Cluster in Unlabeled Datasets
Krishnamoorthi, M.
INFORMATION AND NETWORK TECHNOLOGY, 2011, 4 : 120 - 124
[9] The Consolidated Tree Construction Algorithm in Imbalanced Defect Prediction Datasets
Ibarguren, Igor
Perez, Jesus M.
Mugerza, Javier
Rodriguez, Daniel
Harrison, Rachel
2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2656 - 2660
[10] An approach to software defect prediction for small-sized datasets
Bal, Pravas Ranjan
Shukla, Suyash
Kumar, Sandeep
APPLIED INTELLIGENCE, 2025, 55 (06)

← 1 2 3 4 5 →