Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引：0

作者：

Grm, Klemen ^{[1
]}

机构：

[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia

来源：

ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期

关键词：

language models; machine learning; evaluation methodology;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.

引用

页码：247 / 253

页数：7

共 50 条

[1] Evaluating the capabilities of large language models using machine learning tasks at inference-time
Grm, Klemen
Elektrotehniski Vestnik/Electrotechnical Review, 2023, 90 (05): : 247 - 253
[2] Sources of Hallucination by Large Language Models on Inference Tasks
McKenna, Nick
Li, Tianyi
Cheng, Liang
Hosseini, Mohammad Javad
Johnson, Mark
Steedman, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774
[3] MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
Zhang, Lei
Zhang, Yuge
Ren, Kan
Li, Dongsheng
Yang, Yuqing
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2931 - 2959
[4] Evaluating Large Language Models on Controlled Generation Tasks
Sun, Jiao
Tian, Yufei
Zhou, Wangchunshu
Xu, Nan
Hu, Qian
Gupta, Rahul
Wieting, John
Peng, Nanyun
Ma, Xuezhe
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
[5] TrojBits: A Hardware Aware Inference-Time Attack on Transformer-Based Language Models
Al Ghanim, Mansour
Santriaji, Muhammad
Lou, Qian
Solihin, Yan
Frontiers in Artificial Intelligence and Applications, 2023, 372 : 60 - 68
[6] Evaluating large language models in theory of mind tasks
Kosinski, Michal
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (45)
[7] Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
Fucci, Dennis
Gaido, Marco
Papi, Sara
Cettolo, Mauro
Negri, Matteo
Bentivogli, Luisa
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11505 - 11517
[8] Evaluating Network Embedding Models for Machine Learning Tasks
Oluigbo, Ikenna
Haddad, Mohammed
Seba, Hamida
COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 1, 2020, 881 : 915 - 927
[9] Assessing Inference Time in Large Language Models
Walkowiak, Bartosz
Walkowiak, Tomasz
SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 296 - 305
[10] Evaluating the Elementary Multilingual Capabilities of Large Language Models with MULTIQ
Holtermann, Carolin
Rottger, Paul
Dill, Timm
Lauscher, Anne
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4476 - 4494

← 1 2 3 4 5 →