Cost of Flaky Tests in Continuous Integration: An Industrial Case Study

被引:0
|
作者
Leinen, Fabian [1 ]
Elsner, Daniel [1 ]
Pretschner, Alexander [1 ]
Stahlbauer, Andreas [2 ]
Sailer, Michael [2 ]
Juergens, Elmar [2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] CQSE GmbH, Munich, Germany
关键词
flaky tests; continuous integration; regression testing; cost modeling; industrial case study;
D O I
10.1109/ICST60714.2024.00037
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers and practitioners alike increasingly often perceive flaky tests as a major challenge in software engineering. They spend a lot of effort trying to detect, repair, and mitigate the negative effects of flaky tests. However, it is yet unclear where and to what extent the costs of flaky tests manifest in industrial Continuous Integration (CI) development processes. In this study, we compile cost factors introduced by flaky tests in CI development from research and practice and derive a cost model that allows gaining insight into the costs incurred. We then instantiate this model in a case study of a large, commercial software project with similar to 30 developers and similar to 1M SLoC. We analyze five years of development history, including CI test logs, commits from the Version Control System (VCS), issue tickets, and tracked work time to quantify the cost factors implied by flaky tests. We find that the time spent dealing with flaky tests in the studied project represents at least 2.5% of the productive developer time. This effort is divided into investigating potentially flaky test failures, which accounts for 1.1% of the total time spent, repairing flaky tests adds another 1.3%, and developing tools to monitor flaky tests adds 0.1%. Contrary to most other studies, we find the cost for rerunning tests to be negligible and inexpensive. Automatically rerunning a test costs 0.02 cents, while not rerunning and thus letting the pipeline fail results in a manual investigation costing $5.67 in our context. The insights gained from our case study have led to the decision to shift effort from investigation and repair to automatically rerunning tests. Our cost model can help practitioners analyze the cost of flaky tests in their context and make informed decisions. Furthermore, our case study provides a first step to better understand the costs of flaky tests, which can lead researchers to industry-relevant problems.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 50 条
  • [1] Mutation Testing in Continuous Integration: An Exploratory Industrial Case Study
    Orgard, Jonathan
    Gay, Gregory
    de Oliveira Neto, Francisco Gomes
    Viggedal, Kim
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS, ICSTW, 2023, : 324 - 333
  • [2] A Study on the Lifecycle of Flaky Tests
    Lam, Wing
    Muslu, Kivanc
    Sajnani, Hitesh
    Thummalapenta, Suresh
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1471 - 1482
  • [3] An Empirical Study of Flaky Tests in Python
    Gruber, Martin
    Lukasczyk, Stephan
    Kroiß, Florian
    Fraser, Gordon
    Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI), 2022, P-320 : 37 - 38
  • [4] An Empirical Study of Flaky Tests in JavaScript
    Hashemi, Negar
    Tahir, Amjed
    Rasheed, Shawn
    Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022, 2022, : 24 - 34
  • [5] An Empirical Study of Flaky Tests in Java']JavaScript
    Hashemi, Negar
    Tahir, Amjed
    Rasheed, Shawn
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 24 - 34
  • [6] An Empirical Study of Flaky Tests in Android Apps
    Chandani, Swapna
    Sreshtha, Chandani
    Meng, Na
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 534 - 538
  • [7] Root Causing Flaky Tests in a Large-Scale Industrial Setting
    Lam, Wing
    Godefroid, Patrice
    Nath, Suman
    Santhiar, Anirudh
    Thummalapenta, Suresh
    PROCEEDINGS OF THE 28TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS (ISSTA '19), 2019, : 101 - 111
  • [8] An Empirical Study of Flaky Tests in Python']Python
    Gruber, Martin
    Lukasczyk, Stephan
    Krois, Florian
    Fraser, Gordon
    2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 148 - 158
  • [9] A Large-Scale Longitudinal Study of Flaky Tests
    Lam, Wing
    Winter, Stefan
    Wei, Anjiang
    Xie, Tao
    Marinov, Darko
    Bell, Jonathan
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2020, 4 (04):
  • [10] Replication Study on the Usability of Code Vocabulary in Predicting Flaky Tests
    Haben, Guillaume
    Habchi, Sarra
    Papadakis, Mike
    Cordy, Maxime
    Le Traon, Yves
    2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 219 - 229