Evaluating defect prediction approaches: a benchmark and an extensive comparison

被引:0
|
作者
Marco D’Ambros
Michele Lanza
Romain Robbes
机构
[1] University of Lugano,REVEAL @ Faculty of Informatics
[2] University of Chile,PLEIAD Lab @ Computer Science Department (DCC)
来源
关键词
Defect prediction; Source code metrics; Change metrics;
D O I
暂无
中图分类号
学科分类号
摘要
Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.
引用
收藏
页码:531 / 577
页数:46
相关论文
共 50 条
  • [21] Approaches to evaluating the function of prediction of decentralized applications
    Sigova, Maria V.
    Klioutchnikov, Igor K.
    Zatevakhina, Anna V.
    Klioutchnikov, Oleg I.
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: APPLICATIONS AND INNOVATIONS (IC-AIAI), 2018, : 6 - 11
  • [22] Evaluating Stratification Alternatives to Improve Software Defect Prediction
    Pelayo, Lourdes
    Dick, Scott
    IEEE TRANSACTIONS ON RELIABILITY, 2012, 61 (02) : 516 - 525
  • [23] The FEBEX benchmark test:: case definition and comparison of modelling approaches
    Alonso, EE
    Alcoverro, J
    Coste, F
    Malinsky, L
    Merrien-Soukatchoff, V
    Kadiri, I
    Nowak, T
    Shao, H
    Nguyen, TS
    Selvadurai, APS
    Armand, G
    Sobolik, SR
    Itamura, M
    Stone, CM
    Webb, SW
    Rejeb, A
    Tijani, M
    Maouche, Z
    Kobayashi, A
    Kurikami, H
    Ito, A
    Sugita, Y
    Chijimatsu, M
    Börgesson, L
    Hernelind, J
    Rutqvist, J
    Tsang, CF
    Jussila, P
    INTERNATIONAL JOURNAL OF ROCK MECHANICS AND MINING SCIENCES, 2005, 42 (5-6) : 611 - 638
  • [24] Extensive reading in a challenging environment: a comparison of extensive and intensive reading approaches in Saudi Arabia
    Al-Homoud, Faisal
    Schmitt, Norbert
    LANGUAGE TEACHING RESEARCH, 2009, 13 (04) : 383 - 401
  • [25] Comparison of dislocation density based approaches for prediction of defect structure evolution in aluminium and copper processed by ECAP
    Bratov, V.
    Borodin, E. N.
    MATERIALS SCIENCE AND ENGINEERING A-STRUCTURAL MATERIALS PROPERTIES MICROSTRUCTURE AND PROCESSING, 2015, 631 : 10 - 17
  • [26] Privacy Protection Optimization for Federated Software Defect Prediction via Benchmark Analysis
    Liu, Ying
    Li, Yong
    Wen, Ming
    Zhang, Wenjing
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (06): : 1177 - 1187
  • [27] Comparison of two approaches to prediction of prices
    Pokorny, Jiri
    Fronek, Pavel
    Abrahamova, Miluse
    AGRARIAN PERSPECTIVES XXII: DEVELOPMENT TRENDS IN AGRIBUSINESS, 2013, : 70 - 77
  • [28] Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work?
    Amasaki, Sousuke
    PROMISE'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, 2018, : 32 - 41
  • [29] An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario
    Schmidt, Michael
    Hornung, Thomas
    Kuchklin, Norbert
    Lausen, Georg
    Pinkel, Christoph
    SEMANTIC WEB - ISWC 2008, 2008, 5318 : 82 - +
  • [30] Comparison of reconstruction plate and double flap for reconstruction of an extensive mandibular defect
    Miyamoto, Shimpei
    Sakuraba, Minoru
    Nagamatsu, Shogo
    Kamizono, Kenichi
    Hayashi, Ryuichi
    MICROSURGERY, 2012, 32 (06) : 452 - 457