A Testing Framework for AI Linguistic Systems (testFAILS)

被引:0
|
作者
Kumar, Y. [1 ]
Morreale, P. [1 ]
Sorial, P. [1 ]
Delgado, J. [1 ]
Li, J. Jenny [1 ]
Martins, P. [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
关键词
Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;
D O I
10.1109/AITest58265.2023.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.
引用
收藏
页码:51 / 54
页数:4
相关论文
共 50 条
  • [41] Towards a Privacy and Security-Aware Framework for Ethical AI: Guiding the Development and Assessment of AI Systems
    Korobenko, Daria
    Nikiforova, Anastasija
    Sharma, Rajesh
    PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2024, 2024, : 740 - 753
  • [42] The Situation Awareness Framework for Explainable AI (SAFE-AI) and Human Factors Considerations for XAI Systems
    Sanneman, Lindsay
    Shah, Julie A.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2022, 38 (18-20) : 1772 - 1788
  • [43] TESTING LINGUISTIC MINORITIES
    OLMEDO, EL
    AMERICAN PSYCHOLOGIST, 1981, 36 : 1078 - 1085
  • [44] InteractEva: A Simulation-Based Evaluation Framework for Interactive AI Systems
    Katsis, Yannis
    Hanafi, Maeda F.
    Cooper, Martin Santillan
    Li, Yunyao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13182 - 13184
  • [45] A Framework for Effective AI Recommendations in Cyber-Physical-Human Systems
    Dave, Aditya
    Bang, Heeseung
    Malikopoulos, Andreas A.
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1379 - 1384
  • [46] A framework for group decision support systems: Combining AI tools and OR techniques
    Karacapilidis, NI
    Pappis, CP
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1997, 103 (02) : 373 - 388
  • [47] AI ethical biases: normative and information systems development conceptual framework
    Chowdhury, Tanay
    Oredo, John
    JOURNAL OF DECISION SYSTEMS, 2023, 32 (03) : 617 - 633
  • [48] DeepFlow: A Cross-Stack Pathfinding Framework for Distributed AI Systems
    Ardalani, Newsha
    Pal, Saptadeep
    Gupta, Puneet
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (02)
  • [49] Containerized Wearable Edge AI Inference Framework in Mobile Health Systems
    Nkenyereye, Lionel
    Lee, Boon Giin
    Chung, Wan-Young
    INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT II, 2024, 14532 : 273 - 278
  • [50] SimTester: A Controllable and Observable Testing Framework for Embedded Systems
    Yu, Tingting
    Srisa-an, Witawas
    Rothermel, Gregg
    ACM SIGPLAN NOTICES, 2012, 47 (07) : 51 - 61