Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引：0

作者：

Grm, Klemen ^{[1
]}

机构：

[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia

来源：

ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期

关键词：

language models; machine learning; evaluation methodology;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.

引用

页码：247 / 253

页数：7

共 50 条

[41] Navigating WebAI: Training Agents to CompleteWeb Tasks with Large Language Models and Reinforcement Learning
Thil, Lucas-Andrei
Popa, Mirela
Spanakis, Gerasimos
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 866 - 874
[42] TLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks
Lin, Jiaye
Peng, Shuang
Zhang, Zhong
Zhao, Peilin
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 1119 - 1124
[43] Large language models: a survey of their development, capabilities, and applications
Annepaka, Yadagiri
Pakray, Partha
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2967 - 3022
[44] Towards Automatic Evaluation of NLG Tasks Using Conversational Large Language Models
Riyadh, Md
Shafiq, M. Omair
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT II, 2023, 676 : 425 - 437
[45] Question Generation Capabilities of "Small" Large Language Models
Berger, Joshua
Koss, Jonathan
Stamatakis, Markos
Hoppe, Anett
Ewerth, Ralph
Wartenal, Christian
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
[46] Exploring Capabilities of Large Language Models such as ChatGPT in Radiation
Dennstadt, Fabio
Hastings, Janna
Putora, Paul Martin
Vu, Erwin
Fischer, Galina F.
Suveg, Krisztian
Glatzer, Markus
Riggenbach, Elena
Ha, Hong-Linh
Cihoric, Nikola
ADVANCES IN RADIATION ONCOLOGY, 2024, 9 (03)
[47] Exploration of the capabilities of large language models in preoperative assessment
Burdon, Robert
Braunbeck, Kai
Kotze, Alwyn
BRITISH JOURNAL OF ANAESTHESIA, 2024, 133 (02) : 460 - 460
[48] Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks
Smutny, Pavel
Bojko, Michal
APPLIED SCIENCES-BASEL, 2024, 14 (21):
[49] Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Riddell, Martin
Ni, Ansong
Cohan, Arman
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14116 - 14137
[50] An Exploratory Evaluation of Large Language Models Using Empirical Software Engineering Tasks
Liang, Wenjun
Xiao, Guanping
PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 31 - 40

← 1 2 3 4 5 →