Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引：0

作者：

Marco D’Ambros

Michele Lanza

Romain Robbes

机构：

[1] University of Lugano,REVEAL @ Faculty of Informatics

[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)

来源：

Empirical Software Engineering | 2012年 / 17卷

关键词：

Defect prediction; Source code metrics; Change metrics;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.

引用

页码：531 / 577

页数：46

共 50 条

[31] Evaluating Defect Prediction Models for a Large Evolving Software System
Mende, Thilo
Koschke, Rainer
Leszak, Marek
13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 247 - +
[32] Towards a Fault-Detection Benchmark for Evaluating Software Product Line Testing Approaches
Fischer, Stefan
Lopez-Herrejon, Roberto Erick
Egyed, Alexander
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 2034 - 2041
[33] On the relative value of data resampling approaches for software defect prediction
Kwabena Ebo Bennin
Jacky W. Keung
Akito Monden
Empirical Software Engineering, 2019, 24 : 602 - 636
[34] On the relative value of data resampling approaches for software defect prediction
Bennin, Kwabena Ebo
Keung, Jacky W.
Monden, Akito
EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (02) : 602 - 636
[35] Defect assessment benchmark studies
Hooton, D.G.
Sharples, J.K.
1600, Thomas Telford Services Ltd, London, United Kingdom (34):
[36] Defect assessment benchmark studies
Hooton, DG
Sharples, JK
NUCLEAR ENERGY-JOURNAL OF THE BRITISH NUCLEAR ENERGY SOCIETY, 1995, 34 (05): : 293 - 302
[37] Comparison of NFPA and ISO approaches for evaluating separation distances
LaChance, Jeffrey L.
Middleton, Bobby
Groth, Katrina M.
INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2012, 37 (22) : 17488 - 17496
[38] Evaluating multivariate forecast densities: a comparison of two approaches
Clements, MP
Smith, J
INTERNATIONAL JOURNAL OF FORECASTING, 2002, 18 (03) : 397 - 407
[39] Evaluating LSTM Time Series Prediction Performance on Benchmark CPUs and GPUs in Cloud Environments
Saha, Aditi
Rahman, Mohammad
Wu, Fan
PROCEEDINGS OF THE 2024 ACM SOUTHEAST CONFERENCE, ACMSE 2024, 2024, : 321 - 322
[40] Comparison of various surgical approaches for extensive bilateral colorectal liver metastases
Reissfelder, Christoph
Rahbari, Nuh N.
Bejarano, L. Urrutia
Schmidt, Thomas
Kortes, Nikolas
Kauczor, Hans-Ulrich
Buechler, Markus W.
Weitz, Juergen
Koch, Moritz
LANGENBECKS ARCHIVES OF SURGERY, 2014, 399 (04) : 481 - 491

← 1 2 3 4 5 →