Evaluating the capabilities of large language models using machine learning tasks at inference-time

被引:0
|
作者
Grm, Klemen [1 ]
机构
[1] Univ Ljubljani, Fak Elektrotehniko, Trzaska Cesta 25, Ljubljana 1000, Slovenia
来源
ELEKTROTEHNISKI VESTNIK | 2023年 / 90卷 / 05期
关键词
language models; machine learning; evaluation methodology;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Machine learning is the domain of algorithms capable of learning from data to improve their performance on a task or set of tasks. Common machine learning tasks include classification, regression, and generative modelling. The most common modern example of machine learners in practical use is deep neural networks coupled with an extrinsic optimizer such as stochastic gradient descent. Recently, scaled-up large language models have shown increasing capabilities of in-context meta-learning, which has been used to improve their performance on language tasks through few-shot learning. In this paper, we show that pre-trained large language models can act as machine learners with regard to in-context data, without using extrinsic optimization tools or weight updates. By evaluating the language models' inference time machine learning abilities on synthetic or appropriately transformed datasets, we conclusively show that they're able to model complex relationships between data in the input context. This implies that inference-time machine learning tasks represent a meaningful capability evaluation task for large language models.
引用
收藏
页码:247 / 253
页数:7
相关论文
共 50 条
  • [21] Introspective Capabilities in Large Language Models
    Long, Robert
    JOURNAL OF CONSCIOUSNESS STUDIES, 2023, 30 (9-10) : 143 - 153
  • [22] L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
    Ni, Ansong
    Yin, Pengcheng
    Zhao, Yilun
    Riddell, Martin
    Feng, Troy
    Shen, Rui
    Yin, Stephen
    Liu, Ye
    Yavuz, Semih
    Xiong, Caiming
    Joty, Shafiq
    Zhou, Yingbo
    Radev, Dragomir
    Cohan, Arman
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1311 - 1329
  • [23] Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
    Pendyala, Vishnu S.
    Kamdar, Karnavee
    Mulchandani, Kapil
    ELECTRONICS, 2025, 14 (02):
  • [24] Evaluating large language models on geospatial tasks: a multiple geospatial task benchmarking study
    Xu, Liuchang
    Zhao, Shuo
    Lin, Qingming
    Chen, Luyao
    Luo, Qianqian
    Wu, Sensen
    Ye, Xinyue
    Feng, Hailin
    Du, Zhenhong
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2025, 18 (01)
  • [25] Evaluating Plant Gene Models Using Machine Learning
    Upadhyaya, Shriprabha R.
    Bayer, Philipp E.
    Tay Fernandez, Cassandria G.
    Petereit, Jakob
    Batley, Jacqueline
    Bennamoun, Mohammed
    Boussaid, Farid
    Edwards, David
    PLANTS-BASEL, 2022, 11 (12):
  • [26] Using Machine Learning to Evaluate and Enhance Models of Probabilistic Inference
    Gloeckner, Andreas
    Jekel, Marc
    Lisovoj, Daria
    DECISION-WASHINGTON, 2024, 11 (04): : 633 - 651
  • [27] Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries
    Mittal, Ashish
    Sarawagi, Sunita
    Jyothi, Preethi
    Saon, George
    Kurata, Gakuto
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14820 - 14835
  • [28] Evaluating and Improving the Coreference Capabilities of Machine Translation Models
    Yehudai, Asaf
    Cattan, Arie
    Abend, Omri
    Stanovsky, Gabriel
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 980 - 992
  • [29] Inference-Time Adaptation for Improved Transfer Ability and Generalization in Deformable Image Registration Deep Learning
    Sang, Y.
    McNitt-Gray, M.
    Yang, Y.
    Cao, M.
    Low, D.
    Ruan, D.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2022, 114 (03): : E104 - E104
  • [30] Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments
    Beaulieu-Jones, Brendin R.
    Berrigan, Margaret T.
    Shah, Sahaj
    Marwaha, Jayson S.
    Lai, Shuo-Lun
    Brat, Gabriel A.
    SURGERY, 2024, 175 (04) : 936 - 942