A Comprehensive Review of AI Advancement Using testFAILS and testFAILS-2 for the Pursuit of AGI

被引：0

作者：

Kumar, Yulia ^{[1
,2
]}

Lin, Mengtian ^{[1
]}

Paredes, Christopher ^{[1
]}

Li, Dan ^{[1
]}

Yang, Guohao ^{[1
]}

Kruger, Dov ^{[2
]}

Li, J. Jenny ^{[1
]}

Morreale, Patricia ^{[1
]}

机构：

[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA

[2] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08854 USA

来源：

ELECTRONICS | 2024年 / 13卷 / 24期

关键词：

AI evaluation; testFAILS-2; artificial general intelligence; multimodal AI; AI linguistic systems;

D O I：

10.3390/electronics13244991

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In a previous paper we defined testFAILS, a set of benchmarks for measuring the efficacy of Large Language Models in various domains. This paper defines a second-generation framework, testFAILS-2 to measure how current AI engines are progressing towards Artificial General Intelligence (AGI). The testFAILS-2 framework offers enhanced evaluation metrics that address the latest developments in Artificial Intelligence Linguistic Systems (AILS). A key feature of this re-view is the "Chat with Alan" project, a Retrieval-Augmented Generation (RAG)-based AI bot inspired by Alan Turing, designed to distinguish between human and AI generated interactions, thereby emulating Turing's original vision. We assess a variety of models, including ChatGPT-4o-mini and other Small Language Models (SLMs), as well as prominent Large Language Models (LLMs), utilizing expanded criteria that encompass result relevance, accessibility, cost, multimodality, agent creation capabilities, emotional AI attributes, AI search capacity, and LLM-robot integration. The analysis reveals that testFAILS-2 significantly enhances the evaluation of model robustness and user productivity, while also identifying critical areas for improvement in multimodal processing and emotional reasoning. By integrating rigorous evaluation standards and novel testing methodologies, testFAILS-2 advances the assessment of AILS, providing essential insights that contribute to the ongoing development of more effective and resilient AI systems towards achieving AGI.

引用

页数：50

共 29 条

[21] Exploring bimodal HDPE synthesis using single- and dual-site metallocene catalysts: a comprehensive review of the Monte Carlo method and AI-based approaches
Habashi, Ramin Bairami
Najafi, Mohammad
Zarghami, Reza
JOURNAL OF POLYMER RESEARCH, 2024, 31 (03)
[22] Exploring bimodal HDPE synthesis using single- and dual-site metallocene catalysts: a comprehensive review of the Monte Carlo method and AI-based approaches
Ramin Bairami Habashi
Mohammad Najafi
Reza Zarghami
Journal of Polymer Research, 2024, 31
[23] A comprehensive review of optic disc segmentation methods in adult and pediatric retinal images: from conventional methods to artificial intelligence (CR-ODSeg-AP-CM2AI)
Bansal, Avinash
Kubicek, Jan
Penhaker, Marek
Augustynek, Martin
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (04)
[24] A comprehensive review of hydrogen generation by water splitting using 2D nanomaterials: Photo vs electro-catalysis
Almomani, Fares
Al-Rababah, Amani
Tawalbeh, Muhammad
Al-Othman, Amani
FUEL, 2023, 332
[25] A Comprehensive Review of CO2 Mineral Sequestration Methods Using Coal Fly Ash for Carbon Capture, Utilisation, and Storage (CCUS) Technology
Uliasz-Bochenczyk, Alicja
ENERGIES, 2024, 17 (22)
[26] Optimization of S2-alar-iliac screw (S2AI) fixation in adult spine deformity using a comprehensive genetic algorithm and finite element model personalized to patient geometry and bone mechanical properties
Ningxin Qiao
Isabelle Villemure
Zhi Wang
Yvan Petit
Carl-Eric Aubin
Spine Deformity, 2024, 12 : 595 - 602
[27] Optimization of S2-alar-iliac screw (S2AI) fixation in adult spine deformity using a comprehensive genetic algorithm and finite element model personalized to patient geometry and bone mechanical properties
Qiao, Ningxin
Villemure, Isabelle
Wang, Zhi
Petit, Yvan
Aubin, Carl-Eric
SPINE DEFORMITY, 2024, 12 (03) : 595 - 602
[28] A comprehensive review of heavy metals (Pb2+, Cd2+, Ni2+) removal from wastewater using low-cost adsorbents and possible revalorisation of spent adsorbents in blood fingerprint application
Nthwane, Y. B.
Fouda-Mbanga, B. G.
Thwala, M.
Pillay, K.
ENVIRONMENTAL TECHNOLOGY, 2025, 46 (03) : 414 - 430
[29] [1] A. Freeman, "SAR calibration: An overview," IEEE Trans. Geosci. Remote Sens., vol. 30, no. 6, pp. 1107-1121, Nov. 1992. [2] Y. K. Chan and V. Koo, "An introduction to synthetic aperture radar (SAR)," Prog. Electromagn. Res. B, vol. 2, pp. 27-60, 2008. [3] S. Adeli, "Wetland monitoring using SAR data: A meta-analysis and comprehensive review," Remote Sens., vol. 12, no. 14, pp. 2190-2217, 2020. [4] M. Tello, C. López-Martinez, and J. J. Mallorqui, "A novel algorithm for ship detection in SAR imagery based on the wavelet transform," IEEE Geosci. Remote Sens. Lett., vol. 2, no. 2, pp. 201-205, Apr. 2005. [5] M. Liao, C. Wang, Y. Wang, and L. Jiang, "Using SAR images to detect ships from sea clutter," IEEE Geosci. Remote Sens. Lett., vol. 5, no. 2, pp. 194-198, Apr. 2008. [6] S. Song, B. Xu, and J. Yang, "SAR target recognition via supervised discriminative dictionary learning and sparse representation of the SAR-HOG feature," Remote Sens., vol. 8, no. 8, pp. 683-703, 2016.
Chen, Jinyue
Wu, Youming
Dai, Wei
Diao, Wenhui
Li, Yang
Gao, Xin
Sun, Xian
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 8659 - 8671

← 1 2 3 →