On a systematic test of ML-based systems: Experiments on test statistics

被引:0
|
作者
Grube, Nicolas [1 ]
Massah, Mozhdeh [1 ]
Tebbe, Michael [1 ]
Wancura, Paul [1 ]
Wiesbrock, Hans-Werner [1 ]
Grossmann, Juergen [2 ]
Kharma, Sami [2 ]
机构
[1] ITPower Solut GmbH, Berlin, Germany
[2] Fraunhofer Inst Offene Kommunikat Syst FOKUS, Berlin, Germany
关键词
Testing AI Systems; Black Box Test for AI Systems; Systematic Evaluation of Training data sets; Probabilistic Modeling;
D O I
10.1109/AITest62860.2024.00010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML)-based systems are becoming increasingly ubiquitous even in safety critical environments. The strength of ML systems, to solve complex problems with a stochastic model, leads to challenges in the testing domain. This motivates us to introduce a rigorous testing method for ML-models and their application environment akin to classical software testing, which is independent of the training process and considers the probabilistic nature of ML. The approach is based on the concept of the Probabilistically Extended ONtology (PEON). In brief, PEON is a an ontology modeling the designated Operational Design Domain (ODD), which is extended by assigning probability distributions to classes and their individual attributes, as well as probabilistic dependencies between these attributes. The relevant statistical key figures like accuracy depend not only on the ML-based model but also strongly on the statistics of the test data set, which we refer to by quality assurance (QA) data set, to emphasize its independence from the test data set in the training process. This implies that we have to consider the statistical properties of the QA data in order to evaluate an ML-based system. In this paper we present first experimental results comparing established test selection methods e.g. N-wise, with a new approach the PEON. Our findings strongly suggest, that the underlying statistical properties of the QA data significantly influence the test results of ML-based systems. In this respect, careful attention must be paid to the statistical independence and balance of the QA data. The PEON provides a good basis for the composition of QA data sets, which are not only independent of the development process but also statistically representative and balanced with respect to the modeled ODD.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [1] Outline of an Independent Systematic Blackbox Test for ML-based Systems
    Wiesbrock, Hans-Werner
    Grossmann, Juergen
    2024 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2024, : 1 - 10
  • [2] On the Effectiveness of Feature Selection Techniques in the Context of ML-Based Regression Test Prioritization
    Khan, Md Asif
    Azim, Akramul
    Liscano, Ramiro
    Smith, Kevin
    Chang, Yee-Kang
    Seferi, Gkerta
    Tauseef, Qasim
    IEEE ACCESS, 2024, 12 : 131556 - 131575
  • [3] On the Effectiveness of Data Balancing Techniques in the Context of ML-Based Test Case Prioritization
    Mendoza, Jediael
    Mycroft, Jason
    Milbury, Lyam
    Kahani, Nafiseh
    Jaskolka, Jason
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2022, 2022, : 72 - 81
  • [4] ML-Based Teaching Systems: A Conceptual Framework
    Spitzer P.
    Kühl N.
    Heinz D.
    Satzger G.
    Proceedings of the ACM on Human-Computer Interaction, 2023, 7 (CSCW2)
  • [5] Systematic analysis of the test design and performance of AI/ML-based medical devices approved for triage/detection/diagnosis in the USA and Japan
    Mitsuru Yuba
    Kiyotaka Iwasaki
    Scientific Reports, 12
  • [6] Systematic analysis of the test design and performance of AI/ML-based medical devices approved for triage/detection/diagnosis in the USA and Japan
    Yuba, Mitsuru
    Iwasaki, Kiyotaka
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] Asymptotic Variance of Test Statistics in the ML and QML Frameworks
    Anil K. Bera
    Osman Doğan
    Süleyman Taşpınar
    Journal of Statistical Theory and Practice, 2021, 15
  • [8] Machine Learning Stop Signal Test (ML-SST): ML-based Mouse Tracking Enhances Adult ADHD Diagnosis
    Leontyev, Anton
    Yamauchi, Takashi
    Razavi, Moein
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 248 - 252
  • [9] Asymptotic Variance of Test Statistics in the ML and QML Frameworks
    Bera, Anil K.
    Dogan, Osman
    Taspinar, Suleyman
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2020, 15 (01)
  • [10] Setup for ML-Based Prediction of Concrete Rheology from 3D Slump Test Geometry
    Gomzyakov, Albert
    Taubert, Markus
    Sokolov, Dmitrii
    Reuter, Uwe
    Mechtcherine, Viktor
    FOURTH RILEM INTERNATIONAL CONFERENCE ON CONCRETE AND DIGITAL FABRICATION, DC 2024, 2024, 53 : 174 - 181