A Testing Framework for AI Linguistic Systems (testFAILS)

被引:0
|
作者
Kumar, Y. [1 ]
Morreale, P. [1 ]
Sorial, P. [1 ]
Delgado, J. [1 ]
Li, J. Jenny [1 ]
Martins, P. [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
关键词
Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;
D O I
10.1109/AITest58265.2023.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.
引用
收藏
页码:51 / 54
页数:4
相关论文
共 50 条
  • [31] Unit Testing Framework for Embedded Component Systems
    Morisaki, Shuichiro
    Shirata, Seito
    Oyama, Hiroshi
    Azumi, Takuya
    2020 IEEE 18TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING, EUC 2020, 2020, : 41 - 48
  • [32] An Observable and Controllable Testing Framework for Modern Systems
    Yu, Tingting
    PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 1377 - 1380
  • [33] Automated Testing Framework for Embedded Component Systems
    Tomimori, Hinata
    Oyama, Hiroshi
    Azumi, Takuya
    2023 IEEE 26TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC, 2023, : 176 - 183
  • [34] A Framework for optimizing effort in Testing of System of Systems
    Bera, Padmalochan
    Pasala, Anjaneyulu
    2012 THIRD INTERNATIONAL CONFERENCE ON SERVICES IN EMERGING MARKETS (ICSEM), 2012, : 136 - 141
  • [35] Empirical testing of an information systems evaluation framework
    Irani, Zahir
    International Journal of Information Technology and Management, 2002, 1 (2-3) : 298 - 323
  • [36] A Generic Framework for Testing Parallel File Systems
    Cao, Jinrui
    Wang, Simeng
    Dai, Dong
    Zheng, Mai
    Chen, Yong
    PROCEEDINGS OF PDSW-DISCS 2016 - 1ST JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE AND DATA INTENSIVE SCALABLE COMPUTING SYSTEMS, 2016, : 49 - 54
  • [37] Creating a Framework for Testing Wellness Visualization Systems
    Soomlek, Chitsutha
    Benedicenti, Luigi
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON EHEALTH, TELEMEDICINE, AND SOCIAL MEDICINE (ETELEMED 2011), 2011, : 83 - 88
  • [38] Sensitive Region-based Metamorphic Testing Framework using Explainable AI
    Torikoshi, Yuma
    Nishi, Yasuharu
    Takahashi, Juichi
    2023 IEEE/ACM 8TH INTERNATIONAL WORKSHOP ON METAMORPHIC TESTING, MET, 2023, : 25 - 30
  • [39] Virtual Testing Methods of Cyber-Physical Systems: A Framework for Testing Instrumentation and Measurement Systems
    Saleh, Mahdi
    Elhajj, Imad H.
    Asmar, Daniel
    IEEE INSTRUMENTATION & MEASUREMENT MAGAZINE, 2024, 27 (08) : 11 - 15
  • [40] A framework for linguistic modelling
    Lawry, J
    ARTIFICIAL INTELLIGENCE, 2004, 155 (1-2) : 1 - 39