A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI

被引:0
|
作者
Kumar, Yulia [1 ,2 ]
Lin, Mengtian [1 ]
Paredes, Christopher [1 ]
Li, Dan [1 ]
Yang, Guohao [1 ]
Kruger, Dov [2 ]
Li, J. Jenny [1 ]
Morreale, Patricia [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
[2] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08854 USA
来源
ELECTRONICS | 2024年 / 13卷 / 24期
关键词
AI evaluation; testFAILS-2; artificial general intelligence; multimodal AI; AI linguistic systems;
D O I
10.3390/electronics13244991
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In a previous paper we defined testFAILS, a set of benchmarks for measuring the efficacy of Large Language Models in various domains. This paper defines a second-generation framework, testFAILS-2 to measure how current AI engines are progressing towards Artificial General Intelligence (AGI). The testFAILS-2 framework offers enhanced evaluation metrics that address the latest developments in Artificial Intelligence Linguistic Systems (AILS). A key feature of this re-view is the "Chat with Alan" project, a Retrieval-Augmented Generation (RAG)-based AI bot inspired by Alan Turing, designed to distinguish between human and AI generated interactions, thereby emulating Turing's original vision. We assess a variety of models, including ChatGPT-4o-mini and other Small Language Models (SLMs), as well as prominent Large Language Models (LLMs), utilizing expanded criteria that encompass result relevance, accessibility, cost, multimodality, agent creation capabilities, emotional AI attributes, AI search capacity, and LLM-robot integration. The analysis reveals that testFAILS-2 significantly enhances the evaluation of model robustness and user productivity, while also identifying critical areas for improvement in multimodal processing and emotional reasoning. By integrating rigorous evaluation standards and novel testing methodologies, testFAILS-2 advances the assessment of AILS, providing essential insights that contribute to the ongoing development of more effective and resilient AI systems towards achieving AGI.
引用
收藏
页数:50
相关论文
共 29 条