On a systematic test of ML-based systems: Experiments on test statistics

被引：0

作者：

Grube, Nicolas ^{[1
]}

Massah, Mozhdeh ^{[1
]}

Tebbe, Michael ^{[1
]}

Wancura, Paul ^{[1
]}

Wiesbrock, Hans-Werner ^{[1
]}

Grossmann, Juergen ^{[2
]}

Kharma, Sami ^{[2
]}

机构：

[1] ITPower Solut GmbH, Berlin, Germany

[2] Fraunhofer Inst Offene Kommunikat Syst FOKUS, Berlin, Germany

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2024年

关键词：

Testing AI Systems; Black Box Test for AI Systems; Systematic Evaluation of Training data sets; Probabilistic Modeling;

D O I：

10.1109/AITest62860.2024.00010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning (ML)-based systems are becoming increasingly ubiquitous even in safety critical environments. The strength of ML systems, to solve complex problems with a stochastic model, leads to challenges in the testing domain. This motivates us to introduce a rigorous testing method for ML-models and their application environment akin to classical software testing, which is independent of the training process and considers the probabilistic nature of ML. The approach is based on the concept of the Probabilistically Extended ONtology (PEON). In brief, PEON is a an ontology modeling the designated Operational Design Domain (ODD), which is extended by assigning probability distributions to classes and their individual attributes, as well as probabilistic dependencies between these attributes. The relevant statistical key figures like accuracy depend not only on the ML-based model but also strongly on the statistics of the test data set, which we refer to by quality assurance (QA) data set, to emphasize its independence from the test data set in the training process. This implies that we have to consider the statistical properties of the QA data in order to evaluate an ML-based system. In this paper we present first experimental results comparing established test selection methods e.g. N-wise, with a new approach the PEON. Our findings strongly suggest, that the underlying statistical properties of the QA data significantly influence the test results of ML-based systems. In this respect, careful attention must be paid to the statistical independence and balance of the QA data. The PEON provides a good basis for the composition of QA data sets, which are not only independent of the development process but also statistically representative and balanced with respect to the modeled ODD.

引用

页码：11 / 20

页数：10

共 50 条

[31] Bias and discrimination in ML-based systems of administrative decision-making and support
Mac, Trang Anh
COMPUTER LAW & SECURITY REVIEW, 2024, 55
[32] ML-based beamforming for follower jamming rejection in slow FH/MFSK systems
Liu, Fangming
Nguyen-Le, Hung
Ko, C. C.
PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2007, : 339 - 344
[33] ML-based Demand Forecast with External Factors
Hellmers López D.
Julia Kramer K.
Schmidt M.
ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2023, 118 (05): : 324 - 329
[34] ML-based Expert Products Scoring System
Mendori, Patryk
Pelc, Mariusz
Kawala-Sterniuk, Aleksandra
Gola, Mariusz
2024 PROGRESS IN APPLIED ELECTRICAL ENGINEERING, PAEE 2024, 2024,
[35] ML-based Power Seat Control system
Hong, Kang-Woon
Park, Dong-Hwan
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1260 - 1261
[36] ML-based EDA from Research to Production
Liu, Wen-Hao
Ren, Haoxing
2024 INTERNATIONAL VLSI SYMPOSIUM ON TECHNOLOGY, SYSTEMS AND APPLICATIONS, VLSI TSA, 2024,
[37] ML-Based Early Detection of IoT Botnets
Kumar, Ayush
Shridhar, Mrinalini
Swaminathan, Sahithya
Lim, Teng Joon
SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM 2020), PT II, 2020, 336 : 254 - 260
[38] Robustify ML-Based Lithography Hotspot Detectors
Pan, Jingyu
Chang, Chen-Chia
Xie, Zhiyao
Hu, Jiang
Chen, Yiran
2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
[39] A conservative test for multiple comparison based on highly correlated test statistics
Ninomiya, Yoshiyuki
Fujisawa, Hironori
BIOMETRICS, 2007, 63 (04) : 1135 - 1142
[40] On ML-Based Program Translation: Perils and Promises
Malyala, Aniketh
Zhou, Katelyn
Ray, Baishakhi
Chakraborty, Saikat
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING-NEW IDEAS AND EMERGING RESULTS, ICSE-NIER, 2023, : 60 - 65

← 1 2 3 4 5 →