On a systematic test of ML-based systems: Experiments on test statistics

被引:0
|
作者
Grube, Nicolas [1 ]
Massah, Mozhdeh [1 ]
Tebbe, Michael [1 ]
Wancura, Paul [1 ]
Wiesbrock, Hans-Werner [1 ]
Grossmann, Juergen [2 ]
Kharma, Sami [2 ]
机构
[1] ITPower Solut GmbH, Berlin, Germany
[2] Fraunhofer Inst Offene Kommunikat Syst FOKUS, Berlin, Germany
关键词
Testing AI Systems; Black Box Test for AI Systems; Systematic Evaluation of Training data sets; Probabilistic Modeling;
D O I
10.1109/AITest62860.2024.00010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML)-based systems are becoming increasingly ubiquitous even in safety critical environments. The strength of ML systems, to solve complex problems with a stochastic model, leads to challenges in the testing domain. This motivates us to introduce a rigorous testing method for ML-models and their application environment akin to classical software testing, which is independent of the training process and considers the probabilistic nature of ML. The approach is based on the concept of the Probabilistically Extended ONtology (PEON). In brief, PEON is a an ontology modeling the designated Operational Design Domain (ODD), which is extended by assigning probability distributions to classes and their individual attributes, as well as probabilistic dependencies between these attributes. The relevant statistical key figures like accuracy depend not only on the ML-based model but also strongly on the statistics of the test data set, which we refer to by quality assurance (QA) data set, to emphasize its independence from the test data set in the training process. This implies that we have to consider the statistical properties of the QA data in order to evaluate an ML-based system. In this paper we present first experimental results comparing established test selection methods e.g. N-wise, with a new approach the PEON. Our findings strongly suggest, that the underlying statistical properties of the QA data significantly influence the test results of ML-based systems. In this respect, careful attention must be paid to the statistical independence and balance of the QA data. The PEON provides a good basis for the composition of QA data sets, which are not only independent of the development process but also statistically representative and balanced with respect to the modeled ODD.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [31] Bias and discrimination in ML-based systems of administrative decision-making and support
    Mac, Trang Anh
    COMPUTER LAW & SECURITY REVIEW, 2024, 55
  • [32] ML-based beamforming for follower jamming rejection in slow FH/MFSK systems
    Liu, Fangming
    Nguyen-Le, Hung
    Ko, C. C.
    PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2007, : 339 - 344
  • [33] ML-based Demand Forecast with External Factors
    Hellmers López D.
    Julia Kramer K.
    Schmidt M.
    ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2023, 118 (05): : 324 - 329
  • [34] ML-based Expert Products Scoring System
    Mendori, Patryk
    Pelc, Mariusz
    Kawala-Sterniuk, Aleksandra
    Gola, Mariusz
    2024 PROGRESS IN APPLIED ELECTRICAL ENGINEERING, PAEE 2024, 2024,
  • [35] ML-based Power Seat Control system
    Hong, Kang-Woon
    Park, Dong-Hwan
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1260 - 1261
  • [36] ML-based EDA from Research to Production
    Liu, Wen-Hao
    Ren, Haoxing
    2024 INTERNATIONAL VLSI SYMPOSIUM ON TECHNOLOGY, SYSTEMS AND APPLICATIONS, VLSI TSA, 2024,
  • [37] ML-Based Early Detection of IoT Botnets
    Kumar, Ayush
    Shridhar, Mrinalini
    Swaminathan, Sahithya
    Lim, Teng Joon
    SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM 2020), PT II, 2020, 336 : 254 - 260
  • [38] Robustify ML-Based Lithography Hotspot Detectors
    Pan, Jingyu
    Chang, Chen-Chia
    Xie, Zhiyao
    Hu, Jiang
    Chen, Yiran
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [39] A conservative test for multiple comparison based on highly correlated test statistics
    Ninomiya, Yoshiyuki
    Fujisawa, Hironori
    BIOMETRICS, 2007, 63 (04) : 1135 - 1142
  • [40] On ML-Based Program Translation: Perils and Promises
    Malyala, Aniketh
    Zhou, Katelyn
    Ray, Baishakhi
    Chakraborty, Saikat
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING-NEW IDEAS AND EMERGING RESULTS, ICSE-NIER, 2023, : 60 - 65