ZARA: Improving Few-Shot Self-Rationalization for Small Language Models

被引:0
|
作者
Chen, Wei-Lin [1 ]
Yen, An-Zi [2 ]
Wu, Cheng-Kuang [1 ]
Huang, Hen-Hsen [3 ]
Chen, Hsin-Hsi [1 ]
机构
[1] Natl Taiwan Univ, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Taipei, Taiwan
[3] Acad Sinica, Taipei, Taiwan
关键词
ERROR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language models (LMs) that jointly generate end-task answers as well as free-text rationales are known as self-rationalization models. Recent works demonstrate great performance gain for self-rationalization by few-shot prompting LMs with rationale-augmented exemplars. However, the ability to benefit from explanations only emerges with large-scale LMs, which have poor accessibility. In this work, we explore the less-studied setting of leveraging explanations for small LMs to improve few-shot self-rationalization. We first revisit the relationship between rationales and answers. Inspired by the implicit mental process of how human beings assess explanations, we present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training by reducing the problem of plausibility judgement to natural language inference. Experimental results show ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric. In addition, we conduct human and quantitative evaluation validating ZARA's ability to automatically identify plausible and accurate rationale-answer pairs.(1)
引用
收藏
页码:4682 / 4693
页数:12
相关论文
共 50 条
  • [1] Self-training with Few-shot Rationalization
    Bhat, Meghana Moorthy
    Sordoni, Alessandro
    Mukherjee, Subhabrata
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 10702 - 10712
  • [2] Language Models are Few-Shot Butlers
    Micheli, Vincent
    Fleuret, Francois
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9312 - 9318
  • [3] Language Models are Few-Shot Learners
    Brown, Tom B.
    Mann, Benjamin
    Ryder, Nick
    Subbiah, Melanie
    Kaplan, Jared
    Dhariwal, Prafulla
    Neelakantan, Arvind
    Shyam, Pranav
    Sastry, Girish
    Askell, Amanda
    Agarwal, Sandhini
    Herbert-Voss, Ariel
    Krueger, Gretchen
    Henighan, Tom
    Child, Rewon
    Ramesh, Aditya
    Ziegler, Daniel M.
    Wu, Jeffrey
    Winter, Clemens
    Hesse, Christopher
    Chen, Mark
    Sigler, Eric
    Litwin, Mateusz
    Gray, Scott
    Chess, Benjamin
    Clark, Jack
    Berner, Christopher
    McCandlish, Sam
    Radford, Alec
    Sutskever, Ilya
    Amodei, Dario
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Calibrate Before Use: Improving Few-Shot Performance of Language Models
    Zhao, Tony Z.
    Wallace, Eric
    Feng, Shi
    Klein, Dan
    Singh, Sameer
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Few-shot Subgoal Planning with Language Models
    Logeswaran, Lajanugen
    Fu, Yao
    Lee, Moontae
    Lee, Honglak
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5493 - 5506
  • [6] True Few-Shot Learning with Language Models
    Perez, Ethan
    Kiela, Douwe
    Cho, Kyunghyun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [7] Large Language Models Enable Few-Shot Clustering
    Viswanathan, Vijay
    Gashteovski, Kiril
    Lawrence, Carolin
    Wu, Tongshuang
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 321 - 333
  • [8] Multimodal Few-Shot Learning with Frozen Language Models
    Tsimpoukelli, Maria
    Menick, Jacob
    Cabi, Serkan
    Eslami, S. M. Ali
    Vinyals, Oriol
    Hill, Felix
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
    Schick, Timo
    Schuetze, Hinrich
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2339 - 2352
  • [10] Large Language Models for Few-Shot Automatic Term Extraction
    Banerjee, Shubhanker
    Chakravarthi, Bharathi Raja
    McCrae, John Philip
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 137 - 150