Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引：0

作者：

Grm, Klemen ^{[1
]}

机构：

[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia

来源：

ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期

关键词：

language models; machine learning; evaluation methodology;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.

引用

页码：247 / 253

页数：7

共 50 条

[21] Introspective Capabilities in Large Language Models
Long, Robert
JOURNAL OF CONSCIOUSNESS STUDIES, 2023, 30 (9-10) : 143 - 153
[22] L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Ni, Ansong
Yin, Pengcheng
Zhao, Yilun
Riddell, Martin
Feng, Troy
Shen, Rui
Yin, Stephen
Liu, Ye
Yavuz, Semih
Xiong, Caiming
Joty, Shafiq
Zhou, Yingbo
Radev, Dragomir
Cohan, Arman
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1311 - 1329
[23] Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
Pendyala, Vishnu S.
Kamdar, Karnavee
Mulchandani, Kapil
ELECTRONICS, 2025, 14 (02):
[24] Evaluating large language models on geospatial tasks: a multiple geospatial task benchmarking study
Xu, Liuchang
Zhao, Shuo
Lin, Qingming
Chen, Luyao
Luo, Qianqian
Wu, Sensen
Ye, Xinyue
Feng, Hailin
Du, Zhenhong
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2025, 18 (01)
[25] Evaluating Plant Gene Models Using Machine Learning
Upadhyaya, Shriprabha R.
Bayer, Philipp E.
Tay Fernandez, Cassandria G.
Petereit, Jakob
Batley, Jacqueline
Bennamoun, Mohammed
Boussaid, Farid
Edwards, David
PLANTS-BASEL, 2022, 11 (12):
[26] Using Machine Learning to Evaluate and Enhance Models of Probabilistic Inference
Gloeckner, Andreas
Jekel, Marc
Lisovoj, Daria
DECISION-WASHINGTON, 2024, 11 (04): : 633 - 651
[27] Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries
Mittal, Ashish
Sarawagi, Sunita
Jyothi, Preethi
Saon, George
Kurata, Gakuto
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14820 - 14835
[28] Evaluating and Improving the Coreference Capabilities of Machine Translation Models
Yehudai, Asaf
Cattan, Arie
Abend, Omri
Stanovsky, Gabriel
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 980 - 992
[29] Inference-Time Adaptation for Improved Transfer Ability and Generalization in Deformable Image Registration Deep Learning
Sang, Y.
McNitt-Gray, M.
Yang, Y.
Cao, M.
Low, D.
Ruan, D.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2022, 114 (03): : E104 - E104
[30] Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments
Beaulieu-Jones, Brendin R.
Berrigan, Margaret T.
Shah, Sahaj
Marwaha, Jayson S.
Lai, Shuo-Lun
Brat, Gabriel A.
SURGERY, 2024, 175 (04) : 936 - 942

← 1 2 3 4 5 →