The Patch Overfitting Problem in Automated Program Repair: Practical Magnitude and a Baseline for Realistic Benchmarking

被引:0
|
作者
Petke, Justyna [1 ]
Martinez, Matias [2 ]
Kechagia, Maria [1 ]
Aleti, Aldeida [3 ]
Sarro, Federica [1 ]
机构
[1] UCL, London, England
[2] Univ Politecn Cataluna, BarcelonaTech, Barcelona, Spain
[3] Monash Univ, Melbourne, Vic, Australia
基金
澳大利亚研究理事会; 英国工程与自然科学研究理事会;
关键词
Overfitting; Automated Program Repair; Patch Assessment;
D O I
10.1145/3663529.3663776
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automated program repair techniques aim to generate patches for software bugs, mainly relying on testing to check their validity. The generation of a large number of such plausible yet incorrect patches is widely believed to hinder wider application of APR in practice, which has motivated research in automated patch assessment. We reflect on the validity of this motivation and carry out an empirical study to analyse the extent to which 10 APR tools suffer from the overfitting problem in practice. We observe that the number of plausible patches generated by any of the APR tools analysed for a given bug from the Defects4J dataset is remarkably low, a median of 2, indicating that a developer only needs to consider 2 patches in most cases to be confident to find a fix or confirming its nonexistence. This study unveils that the overfitting problem might not be as bad as previously thought. We reflect on current evaluation strategies of automated patch assessment techniques and propose a Random Selection baseline to assess whether and when using such techniques is beneficial for reducing human effort. We advocate future work should evaluate the benefit arising from patch overfitting assessment usage against the random baseline.
引用
收藏
页码:452 / 456
页数:5
相关论文
共 23 条
  • [21] Revisiting Object Similarity-based Patch Ranking in Automated Program Repair: An Extensive Study
    Ghanbari, Ali
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 16 - 23
  • [22] One Size Does Not Fit All: Multi-granularity Patch Generation for Better Automated Program Repair
    Lin, Bo
    Wang, Shangwen
    Wen, Ming
    Chen, Liqian
    Mao, Xiaoguang
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1554 - 1566
  • [23] You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
    Liu, Kui
    Koyuncu, Anil
    Bissyande, Tegawende F.
    Kim, Dongsun
    Klein, Jacques
    Le Traon, Yves
    2019 IEEE 12TH CONFERENCE ON SOFTWARE TESTING, VALIDATION AND VERIFICATION (ICST 2019), 2019, : 102 - 113