Language Models are Few-Shot Learners

被引:0
|
作者
Brown, Tom B.
Mann, Benjamin
Ryder, Nick
Subbiah, Melanie
Kaplan, Jared [1 ,2 ]
Dhariwal, Prafulla
Neelakantan, Arvind
Shyam, Pranav
Sastry, Girish
Askell, Amanda
Agarwal, Sandhini
Herbert-Voss, Ariel
Krueger, Gretchen
Henighan, Tom
Child, Rewon
Ramesh, Aditya
Ziegler, Daniel M.
Wu, Jeffrey
Winter, Clemens
Hesse, Christopher
Chen, Mark
Sigler, Eric
Litwin, Mateusz
Gray, Scott
Chess, Benjamin
Clark, Jack
Berner, Christopher
McCandlish, Sam
Radford, Alec
Sutskever, Ilya
Amodei, Dario
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] OpenAI, San Francisco, CA 94110 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Task Contamination: Language Models May Not Be Few-Shot Anymore
    Li, Changmao
    Flanigan, Jeffrey
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18471 - 18480
  • [22] ATLAS: Few-shot Learning with Retrieval Augmented Language Models
    Izacard, Gautier
    Lewis, Patrick
    Lomeli, Maria
    Hosseini, Lucas
    Petroni, Fabio
    Schick, Timo
    Dwivedi-Yu, Jane
    Joulin, Armand
    Riedel, Sebastian
    Grave, Edouard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [23] mGPT: Few-Shot Learners Go Multilingual
    Shliazhko, Oleh
    Fenogenova, Alena
    Tikhonova, Maria
    Kozlova, Anastasia
    Mikhailov, Vladislav
    Shavrina, Tatiana
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 58 - 79
  • [24] Constrained Language Models Yield Few-Shot Semantic Parsers
    Shin, Richard
    Lin, Christopher H.
    Thomson, Sam
    Chen, Charles
    Roy, Subhro
    Platanios, Emmanouil Antonios
    Pauls, Adam
    Klein, Dan
    Eisner, Jason
    Van Durme, Benjamin
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7699 - 7715
  • [25] Few-Shot Adaptation of Medical Vision-Language Models
    Shakeri, Fereshteh
    Huang, Yunshi
    Silva-Rodriguez, Julio
    Bahig, Houda
    Tang, An
    Dolz, Jose
    Ben Ayed, Ismail
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 553 - 563
  • [26] Getting to Production with Few-shot Natural Language Generation Models
    Heidari, Peyman
    Einolghozati, Arash
    Jain, Shashank
    Batra, Soumya
    Callender, Lee
    Arun, Ankit
    Mei, Shawn
    Gupta, Sonal
    Donmez, Pinar
    Bhardwaj, Vikas
    Kumar, Anuj
    White, Michael
    SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), 2021, : 66 - 76
  • [27] Learning Meta Soft Prompt for Few-Shot Language Models
    Chien, Jen-Tzung
    Chen, Ming-Yen
    Xue, Jing-Hao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 57 - 62
  • [28] Few-Shot Semantic Parsing with Language Models Trained on Code
    Shin, Richard
    Van Durme, Benjamin
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5417 - 5425
  • [29] WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
    Gao, Heting
    Ni, Junrui
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 2738 - 2742
  • [30] CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
    Song, Haoyu
    Dong, Li
    Zhang, Wei-Nan
    Liu, Ting
    Wei, Furu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6088 - 6100