Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引:0
|
作者
Marco D’Ambros
Michele Lanza
Romain Robbes
机构
[1] University of Lugano,REVEAL @ Faculty of Informatics
[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)
来源
关键词
Defect prediction; Source code metrics; Change metrics;
D O I
暂无
中图分类号
学科分类号
摘要
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
引用
收藏
页码:531 / 577
页数:46
相关论文
共 50 条
  • [1] Evaluating defect prediction approaches: a benchmark and an extensive comparison
    D'Ambros, Marco
    Lanza, Michele
    Robbes, Romain
    EMPIRICAL SOFTWARE ENGINEERING, 2012, 17 (4-5) : 531 - 577
  • [2] A Comparative Study to Benchmark Cross-project Defect Prediction Approaches
    Herbold, Steffen
    Trautsch, Alexander
    Grabowski, Jens
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 1063 - 1063
  • [3] A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches
    Herbold, Steffen
    Trautsch, Alexander
    Grabowski, Jens
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (09) : 811 - 833
  • [4] Evaluating benchmark subsetting approaches
    Yi, Joshua J.
    Sendag, Resit
    Eeckhout, Lieven
    Joshi, Ajay
    Lilja, David J.
    Johns, Lizy K.
    PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2006, : 93 - +
  • [5] Benchmark for Evaluating Pedestrian Action Prediction
    Kotseruba, Iuliia
    Rasouli, Amir
    Tsotsos, John K.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1257 - 1267
  • [6] Evaluating Defect Prediction Approaches Using A Massive Set of Metrics: An Empirical Study
    Xuan, Xiao
    Lo, David
    Xia, Xin
    Tian, Yuan
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1644 - 1647
  • [7] A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches
    Wang, Wenkang
    Shuai, Yunyan
    Yang, Qiurong
    Zhang, Fuhao
    Zeng, Min
    Li, Min
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (02)
  • [8] A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction
    Catal, Cagatay
    JOURNAL OF INTELLIGENT SYSTEMS, 2014, 23 (01) : 75 - 82
  • [9] Comparison of Selected Portfolio Approaches with Benchmark
    Nedela, David
    38TH INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS (MME 2020), 2020, : 389 - 395
  • [10] Building a Benchmark for Evaluating Link Prediction Methods
    Xiao, Junyan
    Wang, Peng
    Meng, Yue
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 1065 - 1070