Bug characterization in machine learning-based systems

被引:0
|
作者
Mohammad Mehdi Morovati
Amin Nikanjam
Florian Tambon
Foutse Khomh
Zhen Ming (Jack) Jiang
机构
[1] Polytechnique Montréal,SWAT Lab.
[2] York University,undefined
来源
关键词
Software bug; Software testing; ML-based systems; ML bug; Deep learning; Software maintenance; Empirical study;
D O I
暂无
中图分类号
学科分类号
摘要
The rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Since corrective maintenance, i.e. identifying and resolving systems bugs, is a key task in the software development process to deliver reliable software components, it is necessary to investigate the usage of ML components, from the software maintenance perspective. Understanding the bugs’ characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, and symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs is more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying significant attention to the reliability of the ML components is crucial in ML-based systems. These results deepen the understanding of ML bugs and we hope that our findings help shed light on opportunities for designing effective tools for testing and debugging ML-based systems.
引用
收藏
相关论文
共 50 条
  • [21] Machine Learning-Based Bibliometric Analysis of Systems Thinking Research
    Oosthuizen, Rudolph
    Grobbelaar, Schalk
    Proceedings of the 29th International Conference on Engineering, Technology, and Innovation: Shaping the Future, ICE 2023, 2023,
  • [22] Novel Machine Learning-Based Brain Attention Detection Systems
    Wang, Junbo
    Kim, Song-Kyoo
    INFORMATION, 2025, 16 (01)
  • [23] Automatic Extraction of Ontological Explanation for Machine Learning-Based Systems
    Chondamrongkul, Nacha
    Temdee, Punnarumol
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (01) : 133 - 156
  • [24] Machine learning-based classification of time series of chaotic systems
    Uzun, Suleyman
    EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2022, 231 (03): : 493 - 503
  • [25] Exploring Practical Vulnerabilities of Machine Learning-based Wireless Systems
    Liu, Zikun
    Xu, Changming
    Sie, Emerson
    Singh, Gagandeep
    Vasisht, Deepak
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 1801 - 1817
  • [26] Machine learning-based intrusion detection for SCADA systems in healthcare
    Ozturk, Tolgahan
    Turgut, Zeynep
    Akgun, Gokce
    Kose, Cemal
    NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2022, 11 (01):
  • [27] Arabic natural language processing and machine learning-based systems
    Larabi Marie-Sainte S.
    Alalyani N.
    Alotaibi S.
    Ghouzali S.
    Abunadi I.
    IEEE Access, 2019, 7 : 7011 - 7020
  • [28] Machine learning-based classification of time series of chaotic systems
    Süleyman Uzun
    The European Physical Journal Special Topics, 2022, 231 : 493 - 503
  • [29] Machine learning-based automatic focusing for high magnification systems
    Helmy, Islam
    Choi, Wooyeol
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118
  • [30] Machine Learning-Based NOMA in LEO Satellite Communication Systems
    Kang, Min Jeong
    Lee, Jung Hoon
    Chae, Seong Ho
    2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 448 - 450