Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引:0
|
作者
Marco D’Ambros
Michele Lanza
Romain Robbes
机构
[1] University of Lugano,REVEAL @ Faculty of Informatics
[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)
来源
关键词
Defect prediction; Source code metrics; Change metrics;
D O I
暂无
中图分类号
学科分类号
摘要
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
引用
收藏
页码:531 / 577
页数:46
相关论文
共 50 条
  • [31] Evaluating Defect Prediction Models for a Large Evolving Software System
    Mende, Thilo
    Koschke, Rainer
    Leszak, Marek
    13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 247 - +
  • [32] Towards a Fault-Detection Benchmark for Evaluating Software Product Line Testing Approaches
    Fischer, Stefan
    Lopez-Herrejon, Roberto Erick
    Egyed, Alexander
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 2034 - 2041
  • [33] On the relative value of data resampling approaches for software defect prediction
    Kwabena Ebo Bennin
    Jacky W. Keung
    Akito Monden
    Empirical Software Engineering, 2019, 24 : 602 - 636
  • [34] On the relative value of data resampling approaches for software defect prediction
    Bennin, Kwabena Ebo
    Keung, Jacky W.
    Monden, Akito
    EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (02) : 602 - 636
  • [35] Defect assessment benchmark studies
    Hooton, D.G.
    Sharples, J.K.
    1600, Thomas Telford Services Ltd, London, United Kingdom (34):
  • [36] Defect assessment benchmark studies
    Hooton, DG
    Sharples, JK
    NUCLEAR ENERGY-JOURNAL OF THE BRITISH NUCLEAR ENERGY SOCIETY, 1995, 34 (05): : 293 - 302
  • [37] Comparison of NFPA and ISO approaches for evaluating separation distances
    LaChance, Jeffrey L.
    Middleton, Bobby
    Groth, Katrina M.
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2012, 37 (22) : 17488 - 17496
  • [38] Evaluating multivariate forecast densities: a comparison of two approaches
    Clements, MP
    Smith, J
    INTERNATIONAL JOURNAL OF FORECASTING, 2002, 18 (03) : 397 - 407
  • [39] Evaluating LSTM Time Series Prediction Performance on Benchmark CPUs and GPUs in Cloud Environments
    Saha, Aditi
    Rahman, Mohammad
    Wu, Fan
    PROCEEDINGS OF THE 2024 ACM SOUTHEAST CONFERENCE, ACMSE 2024, 2024, : 321 - 322
  • [40] Comparison of various surgical approaches for extensive bilateral colorectal liver metastases
    Reissfelder, Christoph
    Rahbari, Nuh N.
    Bejarano, L. Urrutia
    Schmidt, Thomas
    Kortes, Nikolas
    Kauczor, Hans-Ulrich
    Buechler, Markus W.
    Weitz, Juergen
    Koch, Moritz
    LANGENBECKS ARCHIVES OF SURGERY, 2014, 399 (04) : 481 - 491