Bug characterization in machine learning-based systems

被引：0

作者：

Mohammad Mehdi Morovati

Amin Nikanjam

Florian Tambon

Foutse Khomh

Zhen Ming (Jack) Jiang

机构：

[1] Polytechnique Montréal,SWAT Lab.

[2] York University,undefined

来源：

Empirical Software Engineering | 2024年 / 29卷

关键词：

Software bug; Software testing; ML-based systems; ML bug; Deep learning; Software maintenance; Empirical study;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Since corrective maintenance, i.e. identifying and resolving systems bugs, is a key task in the software development process to deliver reliable software components, it is necessary to investigate the usage of ML components, from the software maintenance perspective. Understanding the bugs’ characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, and symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs is more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying significant attention to the reliability of the ML components is crucial in ML-based systems. These results deepen the understanding of ML bugs and we hope that our findings help shed light on opportunities for designing effective tools for testing and debugging ML-based systems.

引用

共 50 条

[21] Machine Learning-Based Bibliometric Analysis of Systems Thinking Research
Oosthuizen, Rudolph
Grobbelaar, Schalk
Proceedings of the 29th International Conference on Engineering, Technology, and Innovation: Shaping the Future, ICE 2023, 2023,
[22] Novel Machine Learning-Based Brain Attention Detection Systems
Wang, Junbo
Kim, Song-Kyoo
INFORMATION, 2025, 16 (01)
[23] Automatic Extraction of Ontological Explanation for Machine Learning-Based Systems
Chondamrongkul, Nacha
Temdee, Punnarumol
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (01) : 133 - 156
[24] Machine learning-based classification of time series of chaotic systems
Uzun, Suleyman
EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2022, 231 (03): : 493 - 503
[25] Exploring Practical Vulnerabilities of Machine Learning-based Wireless Systems
Liu, Zikun
Xu, Changming
Sie, Emerson
Singh, Gagandeep
Vasisht, Deepak
PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 1801 - 1817
[26] Machine learning-based intrusion detection for SCADA systems in healthcare
Ozturk, Tolgahan
Turgut, Zeynep
Akgun, Gokce
Kose, Cemal
NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2022, 11 (01):
[27] Arabic natural language processing and machine learning-based systems
Larabi Marie-Sainte S.
Alalyani N.
Alotaibi S.
Ghouzali S.
Abunadi I.
IEEE Access, 2019, 7 : 7011 - 7020
[28] Machine learning-based classification of time series of chaotic systems
Süleyman Uzun
The European Physical Journal Special Topics, 2022, 231 : 493 - 503
[29] Machine learning-based automatic focusing for high magnification systems
Helmy, Islam
Choi, Wooyeol
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118
[30] Machine Learning-Based NOMA in LEO Satellite Communication Systems
Kang, Min Jeong
Lee, Jung Hoon
Chae, Seong Ho
2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 448 - 450

← 1 2 3 4 5 →