Language Models are Few-Shot Learners

被引:0
|
作者
Brown, Tom B.
Mann, Benjamin
Ryder, Nick
Subbiah, Melanie
Kaplan, Jared [1 ,2 ]
Dhariwal, Prafulla
Neelakantan, Arvind
Shyam, Pranav
Sastry, Girish
Askell, Amanda
Agarwal, Sandhini
Herbert-Voss, Ariel
Krueger, Gretchen
Henighan, Tom
Child, Rewon
Ramesh, Aditya
Ziegler, Daniel M.
Wu, Jeffrey
Winter, Clemens
Hesse, Christopher
Chen, Mark
Sigler, Eric
Litwin, Mateusz
Gray, Scott
Chess, Benjamin
Clark, Jack
Berner, Christopher
McCandlish, Sam
Radford, Alec
Sutskever, Ilya
Amodei, Dario
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] OpenAI, San Francisco, CA 94110 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Few-Shot Keyword Spotting in Any Language
    Mazumder, Mark
    Banbury, Colby
    Meyer, Josh
    Warden, Pete
    Reddi, Vijay Janapa
    INTERSPEECH 2021, 2021, : 4214 - 4218
  • [42] Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
    Shi, Xiaoming
    Xue, Siqiao
    Wang, Kangrui
    Zhou, Fan
    Zhang, James Y.
    Zhou, Jun
    Tan, Chenhao
    Mei, Hongyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wei, Zhicheng
    Yuan, Nicholas Jing
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1558 - 1568
  • [44] ZARA: Improving Few-Shot Self-Rationalization for Small Language Models
    Chen, Wei-Lin
    Yen, An-Zi
    Wu, Cheng-Kuang
    Huang, Hen-Hsen
    Chen, Hsin-Hsi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4682 - 4693
  • [45] PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
    Mahabadi, Rabeeh Karimi
    Zettlemoyer, Luke
    Henderson, James
    Saeidi, Marzieh
    Mathias, Lambert
    Stoyanov, Veselin
    Yazdani, Majid
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3638 - 3652
  • [46] A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
    Iguez, Julio Silva-Rodr
    Hajimiri, Sina
    Ben Ayed, Ismail
    Dolz, Jose
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23681 - 23690
  • [47] Unsupervised and Few-Shot Parsing from Pretrained Language Models (Extended Abstract)
    Zeng, Zhiyuan
    Xiong, Deyi
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6995 - 7000
  • [48] A few-shot learning method based on knowledge graph in large language models
    Wang, Feilong
    Shi, Donghui
    Aguilar, Jose
    Cui, Xinyi
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [49] Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
    Logan, Robert L.
    Balazevic, Ivana
    Wallace, Eric
    Petroni, Fabio
    Singh, Sameer
    Riedel, Sebastian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2824 - 2835
  • [50] Attentional Meta-learners for Few-shot Polythetic Classification
    Day, Ben
    Vinas, Ramon
    Simidjievski, Nikola
    Lio, Pietro
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,