Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引:0
|
作者
Marco D’Ambros
Michele Lanza
Romain Robbes
机构
[1] University of Lugano,REVEAL @ Faculty of Informatics
[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)
来源
关键词
Defect prediction; Source code metrics; Change metrics;
D O I
暂无
中图分类号
学科分类号
摘要
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
引用
收藏
页码:531 / 577
页数:46
相关论文
共 50 条
  • [41] Comparison of various surgical approaches for extensive bilateral colorectal liver metastases
    Christoph Reissfelder
    Nuh N. Rahbari
    L. Urrutia Bejarano
    Thomas Schmidt
    Nikolas Kortes
    Hans-Ulrich Kauczor
    Markus W. Büchler
    Jürgen Weitz
    Moritz Koch
    Langenbeck's Archives of Surgery, 2014, 399 : 481 - 491
  • [42] A gait phase prediction model trained on benchmark datasets for evaluating a controller for prosthetic legs
    Kim, Minjae
    Hargrove, Levi J.
    FRONTIERS IN NEUROROBOTICS, 2023, 16
  • [43] A comparison of approaches to the prediction of surface wave amplitude
    Dalton, Colleen A.
    Hjoerleifsdottir, Vala
    Ekstroem, Goeran
    GEOPHYSICAL JOURNAL INTERNATIONAL, 2014, 196 (01) : 386 - 404
  • [44] Analysis of benchmark characteristics and benchmark performance prediction
    Saavedra, RH
    Smith, AJ
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1996, 14 (04): : 344 - 384
  • [45] A Comparison Framework of Classification Models for Software Defect Prediction
    Wahono, Romi Satria
    Herman, Nanna Suryana
    Ahmad, Sabrina
    ADVANCED SCIENCE LETTERS, 2014, 20 (10-12) : 1945 - 1950
  • [46] A Benchmark for Evaluating FTLE Computations
    Kuhn, Alexander
    Roessl, Christian
    Weinkauf, Tino
    Theisel, Holger
    IEEE PACIFIC VISUALIZATION SYMPOSIUM 2012, 2012, : 121 - 128
  • [47] PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
    Takamoto, Makoto
    Praditia, Timothy
    Leiteritz, Raphael
    MacKinlay, Dan
    Alesiani, Francesco
    Pflueger, Dirk
    Niepert, Mathias
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Federated brain tumor segmentation: An extensive benchmark
    Manthe, Matthis
    Duffner, Stefan
    Lartizien, Carole
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [49] An extensive numerical benchmark of the various magnetohydrodynamic flows
    Blishchik, Artem
    van der Lans, Mike
    Kenjeres, Sasa
    INTERNATIONAL JOURNAL OF HEAT AND FLUID FLOW, 2021, 90
  • [50] Understanding hot interconnects with an extensive benchmark survey
    Li Y.
    Qi H.
    Lu G.
    Jin F.
    Guo Y.
    Lu X.
    BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2022, 2 (03):