CLAMI: Defect Prediction on Unlabeled Datasets

被引：140

作者：

Nam, Jaechang ^{[1
]}

Kim, Sunghun ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China

来源：

2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) | 2015年

关键词：

STATIC CODE ATTRIBUTES; SOFTWARE; FAULTS; SELECTION; METRICS;

D O I：

10.1109/ASE.2015.56

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Defect prediction on new projects or projects with limited historical data is an interesting problem in software engineering. This is largely because it is difficult to collect defect information to label a dataset for training a prediction model. Cross-project defect prediction (CPDP) has tried to address this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP does not always build a strong prediction model because of the different distributions among datasets. Approaches for defect prediction on unlabeled datasets have also tried to address the problem by adopting unsupervised learning but it has one major limitation, the necessity for manual effort. In this study, we propose novel approaches, CLA and CLAMI, that show the potential for defect prediction on unlabeled datasets in an automated manner without need for manual effort. The key idea of the CLA and CLAMI approaches is to label an unlabeled dataset by using the magnitude of metric values. In our empirical study on seven open-source projects, the CLAMI approach led to the promising prediction performances, 0.636 and 0.723 in average f-measure and AUC, that are comparable to those of defect prediction based on supervised learning.

引用

页码：452 / 463

页数：12

共 50 条

[21] CFIWSE: A Hybrid Preprocessing Approach for Defect Prediction on Imbalance Real-World Datasets
Xu, Jiaxi
Shang, Jingwei
Huang, Zhichang
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 392 - 401
[22] On the Reproducibility of Software Defect Datasets
Zhu, Hao-Nan
Rubio-Gonzalez, Cindy
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2324 - 2335
[23] Unlabeled Data Improves Word Prediction
Loeff, Nicolas
Farhadi, Ali
Endres, Ian
Forsyth, David A.
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 956 - 962
[24] StyleDiff: Attribute comparison between unlabeled datasets in latent disentangled space
Kawano, Keisuke
Kutsuna, Takuro
Tokuhisa, Ryoko
Nakamura, Akihiro
Esaki, Yasushi
IMAGE AND VISION COMPUTING, 2023, 138
[25] So You Need More Method Level Datasets for Your Software Defect Prediction?: Voila!
Shippey, Thomas
Hall, Tracy
Counsell, Steve
Bowes, David
ESEM'16: PROCEEDINGS OF THE 10TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 2016,
[26] Learning from Software defect datasets
Singh, Pradeep
PROCEEDINGS OF 2019 5TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K19), 2019, : 58 - 63
[27] Metal Surface Defect Detection Based on Few Defect Datasets
Li, Ruoming
2019 5TH INTERNATIONAL CONFERENCE ON GREEN POWER, MATERIALS AND MANUFACTURING TECHNOLOGY AND APPLICATIONS (GPMMTA 2019), 2019, 2185
[28] Comprehensive Bibliographic Survey and Forward-Looking Recommendations for Software Defect Prediction: Datasets, Validation Methodologies, Prediction Approaches, and Tools
Mustaqeem, Mohd
Alam, Mahfooz
Mustajab, Suhel
Alshanketi, Faisal
Alam, Shadab
Shuaib, Mohammed
IEEE ACCESS, 2025, 13 : 866 - 903
[29] Improving Chemical Reaction Prediction with Unlabeled Data
Xie, Yu
Zhang, Yuyang
Wong, Ka-Chun
Shi, Meixia
Peng, Chengbin
MOLECULES, 2022, 27 (18):
[30] Software Fault Prediction of Unlabeled Program Modules
Catal, C.
Sevim, U.
Diri, B.
WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 212 - +

← 1 2 3 4 5 →