Cost of Flaky Tests in Continuous Integration: An Industrial Case Study

被引：0

作者：

Leinen, Fabian ^{[1
]}

Elsner, Daniel ^{[1
]}

Pretschner, Alexander ^{[1
]}

Stahlbauer, Andreas ^{[2
]}

Sailer, Michael ^{[2
]}

Juergens, Elmar ^{[2
]}

机构：

[1] Tech Univ Munich, Munich, Germany

[2] CQSE GmbH, Munich, Germany

来源：

2024 IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION, ICST 2024 | 2024年

关键词：

flaky tests; continuous integration; regression testing; cost modeling; industrial case study;

D O I：

10.1109/ICST60714.2024.00037

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Researchers and practitioners alike increasingly often perceive flaky tests as a major challenge in software engineering. They spend a lot of effort trying to detect, repair, and mitigate the negative effects of flaky tests. However, it is yet unclear where and to what extent the costs of flaky tests manifest in industrial Continuous Integration (CI) development processes. In this study, we compile cost factors introduced by flaky tests in CI development from research and practice and derive a cost model that allows gaining insight into the costs incurred. We then instantiate this model in a case study of a large, commercial software project with similar to 30 developers and similar to 1M SLoC. We analyze five years of development history, including CI test logs, commits from the Version Control System (VCS), issue tickets, and tracked work time to quantify the cost factors implied by flaky tests. We find that the time spent dealing with flaky tests in the studied project represents at least 2.5% of the productive developer time. This effort is divided into investigating potentially flaky test failures, which accounts for 1.1% of the total time spent, repairing flaky tests adds another 1.3%, and developing tools to monitor flaky tests adds 0.1%. Contrary to most other studies, we find the cost for rerunning tests to be negligible and inexpensive. Automatically rerunning a test costs 0.02 cents, while not rerunning and thus letting the pipeline fail results in a manual investigation costing $5.67 in our context. The insights gained from our case study have led to the decision to shift effort from investigation and repair to automatically rerunning tests. Our cost model can help practitioners analyze the cost of flaky tests in their context and make informed decisions. Furthermore, our case study provides a first step to better understand the costs of flaky tests, which can lead researchers to industry-relevant problems.

引用

页码：329 / 340

页数：12

共 50 条

[1] Mutation Testing in Continuous Integration: An Exploratory Industrial Case Study
Orgard, Jonathan
Gay, Gregory
de Oliveira Neto, Francisco Gomes
Viggedal, Kim
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS, ICSTW, 2023, : 324 - 333
[2] A Study on the Lifecycle of Flaky Tests
Lam, Wing
Muslu, Kivanc
Sajnani, Hitesh
Thummalapenta, Suresh
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1471 - 1482
[3] An Empirical Study of Flaky Tests in Python
Gruber, Martin
Lukasczyk, Stephan
Kroiß, Florian
Fraser, Gordon
Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI), 2022, P-320 : 37 - 38
[4] An Empirical Study of Flaky Tests in JavaScript
Hashemi, Negar
Tahir, Amjed
Rasheed, Shawn
Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022, 2022, : 24 - 34
[5] An Empirical Study of Flaky Tests in Java']JavaScript
Hashemi, Negar
Tahir, Amjed
Rasheed, Shawn
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 24 - 34
[6] An Empirical Study of Flaky Tests in Android Apps
Chandani, Swapna
Sreshtha, Chandani
Meng, Na
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 534 - 538
[7] Root Causing Flaky Tests in a Large-Scale Industrial Setting
Lam, Wing
Godefroid, Patrice
Nath, Suman
Santhiar, Anirudh
Thummalapenta, Suresh
PROCEEDINGS OF THE 28TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS (ISSTA '19), 2019, : 101 - 111
[8] An Empirical Study of Flaky Tests in Python']Python
Gruber, Martin
Lukasczyk, Stephan
Krois, Florian
Fraser, Gordon
2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 148 - 158
[9] A Large-Scale Longitudinal Study of Flaky Tests
Lam, Wing
Winter, Stefan
Wei, Anjiang
Xie, Tao
Marinov, Darko
Bell, Jonathan
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2020, 4 (04):
[10] Replication Study on the Usability of Code Vocabulary in Predicting Flaky Tests
Haben, Guillaume
Habchi, Sarra
Papadakis, Mike
Cordy, Maxime
Le Traon, Yves
2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 219 - 229

← 1 2 3 4 5 →