Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引:0
|
作者
Grm, Klemen [1 ]
机构
[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia
来源
ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期
关键词
language models; machine learning; evaluation methodology;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.
引用
收藏
页码:247 / 253
页数:7
相关论文
共 50 条
  • [41] Navigating WebAI: Training Agents to CompleteWeb Tasks with Large Language Models and Reinforcement Learning
    Thil, Lucas-Andrei
    Popa, Mirela
    Spanakis, Gerasimos
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 866 - 874
  • [42] TLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks
    Lin, Jiaye
    Peng, Shuang
    Zhang, Zhong
    Zhao, Peilin
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 1119 - 1124
  • [43] Large language models: a survey of their development, capabilities, and applications
    Annepaka, Yadagiri
    Pakray, Partha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2967 - 3022
  • [44] Towards Automatic Evaluation of NLG Tasks Using Conversational Large Language Models
    Riyadh, Md
    Shafiq, M. Omair
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT II, 2023, 676 : 425 - 437
  • [45] Question Generation Capabilities of "Small" Large Language Models
    Berger, Joshua
    Koss, Jonathan
    Stamatakis, Markos
    Hoppe, Anett
    Ewerth, Ralph
    Wartenal, Christian
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
  • [46] Exploring Capabilities of Large Language Models such as ChatGPT in Radiation
    Dennstadt, Fabio
    Hastings, Janna
    Putora, Paul Martin
    Vu, Erwin
    Fischer, Galina F.
    Suveg, Krisztian
    Glatzer, Markus
    Riggenbach, Elena
    Ha, Hong-Linh
    Cihoric, Nikola
    ADVANCES IN RADIATION ONCOLOGY, 2024, 9 (03)
  • [47] Exploration of the capabilities of large language models in preoperative assessment
    Burdon, Robert
    Braunbeck, Kai
    Kotze, Alwyn
    BRITISH JOURNAL OF ANAESTHESIA, 2024, 133 (02) : 460 - 460
  • [48] Comparative Analysis of Chatbots Using Large Language Models for Web Development Tasks
    Smutny, Pavel
    Bojko, Michal
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [49] Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
    Riddell, Martin
    Ni, Ansong
    Cohan, Arman
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14116 - 14137
  • [50] An Exploratory Evaluation of Large Language Models Using Empirical Software Engineering Tasks
    Liang, Wenjun
    Xiao, Guanping
    PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 31 - 40