Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引：0

作者：

Marco D’Ambros

Michele Lanza

Romain Robbes

机构：

[1] University of Lugano,REVEAL @ Faculty of Informatics

[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)

来源：

Empirical Software Engineering | 2012年 / 17卷

关键词：

Defect prediction; Source code metrics; Change metrics;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.

引用

页码：531 / 577

页数：46

共 50 条

[41] Comparison of various surgical approaches for extensive bilateral colorectal liver metastases
Christoph Reissfelder
Nuh N. Rahbari
L. Urrutia Bejarano
Thomas Schmidt
Nikolas Kortes
Hans-Ulrich Kauczor
Markus W. Büchler
Jürgen Weitz
Moritz Koch
Langenbeck's Archives of Surgery, 2014, 399 : 481 - 491
[42] A gait phase prediction model trained on benchmark datasets for evaluating a controller for prosthetic legs
Kim, Minjae
Hargrove, Levi J.
FRONTIERS IN NEUROROBOTICS, 2023, 16
[43] A comparison of approaches to the prediction of surface wave amplitude
Dalton, Colleen A.
Hjoerleifsdottir, Vala
Ekstroem, Goeran
GEOPHYSICAL JOURNAL INTERNATIONAL, 2014, 196 (01) : 386 - 404
[44] Analysis of benchmark characteristics and benchmark performance prediction
Saavedra, RH
Smith, AJ
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1996, 14 (04): : 344 - 384
[45] A Comparison Framework of Classification Models for Software Defect Prediction
Wahono, Romi Satria
Herman, Nanna Suryana
Ahmad, Sabrina
ADVANCED SCIENCE LETTERS, 2014, 20 (10-12) : 1945 - 1950
[46] A Benchmark for Evaluating FTLE Computations
Kuhn, Alexander
Roessl, Christian
Weinkauf, Tino
Theisel, Holger
IEEE PACIFIC VISUALIZATION SYMPOSIUM 2012, 2012, : 121 - 128
[47] PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
Takamoto, Makoto
Praditia, Timothy
Leiteritz, Raphael
MacKinlay, Dan
Alesiani, Francesco
Pflueger, Dirk
Niepert, Mathias
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[48] Federated brain tumor segmentation: An extensive benchmark
Manthe, Matthis
Duffner, Stefan
Lartizien, Carole
MEDICAL IMAGE ANALYSIS, 2024, 97
[49] An extensive numerical benchmark of the various magnetohydrodynamic flows
Blishchik, Artem
van der Lans, Mike
Kenjeres, Sasa
INTERNATIONAL JOURNAL OF HEAT AND FLUID FLOW, 2021, 90
[50] Understanding hot interconnects with an extensive benchmark survey
Li Y.
Qi H.
Lu G.
Jin F.
Guo Y.
Lu X.
BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2022, 2 (03):

← 1 2 3 4 5 →