CLAMI: Defect Prediction on Unlabeled Datasets

被引：140

作者：

Nam, Jaechang ^{[1
]}

Kim, Sunghun ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China

来源：

2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE) | 2015年

关键词：

STATIC CODE ATTRIBUTES; SOFTWARE; FAULTS; SELECTION; METRICS;

D O I：

10.1109/ASE.2015.56

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Defect prediction on new projects or projects with limited historical data is an interesting problem in software engineering. This is largely because it is difficult to collect defect information to label a dataset for training a prediction model. Cross-project defect prediction (CPDP) has tried to address this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP does not always build a strong prediction model because of the different distributions among datasets. Approaches for defect prediction on unlabeled datasets have also tried to address the problem by adopting unsupervised learning but it has one major limitation, the necessity for manual effort. In this study, we propose novel approaches, CLA and CLAMI, that show the potential for defect prediction on unlabeled datasets in an automated manner without need for manual effort. The key idea of the CLA and CLAMI approaches is to label an unlabeled dataset by using the magnitude of metric values. In our empirical study on seven open-source projects, the CLAMI approach led to the promising prediction performances, 0.636 and 0.723 in average f-measure and AUC, that are comparable to those of defect prediction based on supervised learning.

引用

页码：452 / 463

页数：12

共 50 条

[31] Predicting Classification Accuracy of Unlabeled Datasets Using Multiple Deep Neural Networks
You, Shingchern D.
Liu, Hsiao-Chung
Liu, Chien-Hung
IEEE ACCESS, 2022, 10 : 44627 - 44637
[32] Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification
Lu, Nan
Lei, Shida
Niu, Gang
Sato, Issei
Sugiyama, Masashi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[33] Positive-Unlabeled Learning for Network Link Prediction
Gan, Shengfeng
Alshahrani, Mohammed
Liu, Shichao
MATHEMATICS, 2022, 10 (18)
[34] Sequence Prediction with Unlabeled Data by Reward Function Learning
Wu, Lijun
Zhao, Li
Qin, Tao
Lai, Jianhuang
Liu, Tie-Yan
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3098 - 3104
[35] Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers
Bary, Tim
Macq, Benoit
2024 IEEE 22ND MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, MELECON 2024, 2024, : 25 - 30
[36] Gene function prediction using labeled and unlabeled data
Xing-Ming Zhao
Yong Wang
Luonan Chen
Kazuyuki Aihara
BMC Bioinformatics, 9
[37] Taming Overconfident Prediction on Unlabeled Data From Hindsight
Li, Jing
Pan, Yuangang
Tsang, Ivor W.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14151 - 14163
[38] Gene function prediction using labeled and unlabeled data
Zhao, Xing-Ming
Wang, Yong
Chen, Luonan
Aihara, Kazuyuki
BMC BIOINFORMATICS, 2008, 9 (1)
[39] Positive-Unlabeled Learning for Pupylation Sites Prediction
Jiang, Ming
Cao, Jun-Zhe
BIOMED RESEARCH INTERNATIONAL, 2016, 2016
[40] Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach
Lu, Nan
Zhang, Tianyi
Niu, Gang
Sugiyama, Masashi
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1115 - 1124

← 1 2 3 4 5 →