Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引:0
|
作者
Grm, Klemen [1 ]
机构
[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia
来源
ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期
关键词
language models; machine learning; evaluation methodology;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.
引用
收藏
页码:247 / 253
页数:7
相关论文
共 50 条
  • [1] Evaluating the capabilities of large language models using machine learning tasks at inference-time
    Grm, Klemen
    Elektrotehniski Vestnik/Electrotechnical Review, 2023, 90 (05): : 247 - 253
  • [2] Sources of Hallucination by Large Language Models on Inference Tasks
    McKenna, Nick
    Li, Tianyi
    Cheng, Liang
    Hosseini, Mohammad Javad
    Johnson, Mark
    Steedman, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774
  • [3] MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
    Zhang, Lei
    Zhang, Yuge
    Ren, Kan
    Li, Dongsheng
    Yang, Yuqing
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2931 - 2959
  • [4] Evaluating Large Language Models on Controlled Generation Tasks
    Sun, Jiao
    Tian, Yufei
    Zhou, Wangchunshu
    Xu, Nan
    Hu, Qian
    Gupta, Rahul
    Wieting, John
    Peng, Nanyun
    Ma, Xuezhe
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
  • [5] TrojBits: A Hardware Aware Inference-Time Attack on Transformer-Based Language Models
    Al Ghanim, Mansour
    Santriaji, Muhammad
    Lou, Qian
    Solihin, Yan
    Frontiers in Artificial Intelligence and Applications, 2023, 372 : 60 - 68
  • [6] Evaluating large language models in theory of mind tasks
    Kosinski, Michal
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (45)
  • [7] Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection
    Fucci, Dennis
    Gaido, Marco
    Papi, Sara
    Cettolo, Mauro
    Negri, Matteo
    Bentivogli, Luisa
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11505 - 11517
  • [8] Evaluating Network Embedding Models for Machine Learning Tasks
    Oluigbo, Ikenna
    Haddad, Mohammed
    Seba, Hamida
    COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 1, 2020, 881 : 915 - 927
  • [9] Assessing Inference Time in Large Language Models
    Walkowiak, Bartosz
    Walkowiak, Tomasz
    SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 296 - 305
  • [10] Evaluating the Elementary Multilingual Capabilities of Large Language Models with MULTIQ
    Holtermann, Carolin
    Rottger, Paul
    Dill, Timm
    Lauscher, Anne
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4476 - 4494