A Testing Framework for AI Linguistic Systems (testFAILS)

被引：0

作者：

Kumar, Y. ^{[1
]}

Morreale, P. ^{[1
]}

Sorial, P. ^{[1
]}

Delgado, J. ^{[1
]}

Li, J. Jenny ^{[1
]}

Martins, P. ^{[1
]}

机构：

[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2023年

关键词：

Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;

D O I：

10.1109/AITest58265.2023.00017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.

引用

页码：51 / 54

页数：4

共 50 条

[41] Towards a Privacy and Security-Aware Framework for Ethical AI: Guiding the Development and Assessment of AI Systems
Korobenko, Daria
Nikiforova, Anastasija
Sharma, Rajesh
PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 740 - 753
[42] The Situation Awareness Framework for Explainable AI (SAFE-AI) and Human Factors Considerations for XAI Systems
Sanneman, Lindsay
Shah, Julie A.
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2022, 38 (18-20) : 1772 - 1788
[43] TESTING LINGUISTIC MINORITIES
OLMEDO, EL
AMERICAN PSYCHOLOGIST, 1981, 36 : 1078 - 1085
[44] InteractEva: A Simulation-Based Evaluation Framework for Interactive AI Systems
Katsis, Yannis
Hanafi, Maeda F.
Cooper, Martin Santillan
Li, Yunyao
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13182 - 13184
[45] A Framework for Effective AI Recommendations in Cyber-Physical-Human Systems
Dave, Aditya
Bang, Heeseung
Malikopoulos, Andreas A.
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1379 - 1384
[46] A framework for group decision support systems: Combining AI tools and OR techniques
Karacapilidis, NI
Pappis, CP
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1997, 103 (02) : 373 - 388
[47] AI ethical biases: normative and information systems development conceptual framework
Chowdhury, Tanay
Oredo, John
JOURNAL OF DECISION SYSTEMS, 2023, 32 (03) : 617 - 633
[48] DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems
Ardalani, Newsha
Pal, Saptadeep
Gupta, Puneet
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (02)
[49] Containerized Wearable Edge AI Inference Framework in Mobile Health Systems
Nkenyereye, Lionel
Lee, Boon Giin
Chung, Wan-Young
INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT II, 2024, 14532 : 273 - 278
[50] SimTester: A Controllable and Observable Testing Framework for Embedded Systems
Yu, Tingting
Srisa-an, Witawas
Rothermel, Gregg
ACM SIGPLAN NOTICES, 2012, 47 (07) : 51 - 61

← 1 2 3 4 5 →