Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

被引:0
|
作者
Kang, Minki [1 ,2 ,5 ]
Lee, Seanie [2 ]
Baek, Jinheon [2 ]
Kawaguchi, Kenji [3 ]
Hwang, Sung Ju [2 ,4 ]
机构
[1] KRAFTON, Seongnam, South Korea
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[3] Natl Univ Singapore, Singapore, Singapore
[4] DeepAuto Ai, Seoul, South Korea
[5] AITRICS, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks
    Mishra, Aditi
    Rahman, Sajjadur
    Mitra, Kushan
    Kim, Hannah
    Hruschka, Estevam
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 8117 - 8139
  • [2] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    Lewis, Patrick
    Perez, Ethan
    Piktus, Aleksandra
    Petroni, Fabio
    Karpukhin, Vladimir
    Goyal, Naman
    Kuttler, Heinrich
    Lewis, Mike
    Yih, Wen-tau
    Rocktaschel, Tim
    Riedel, Sebastian
    Kiela, Douwe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Knowledge-Augmented Language Model Verification
    Baek, Jinheon
    Jeong, Soyeong
    Kang, Minki
    Park, Jong C.
    Hwang, Sung Ju
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1720 - 1736
  • [4] Knowledge-Augmented Language Models for Cause-Effect Relation Classification
    Hosseini, Pedram
    Broniatowski, David A.
    Diab, Mona
    PROCEEDINGS OF THE FIRST WORKSHOP ON COMMONSENSE REPRESENTATION AND REASONING (CSRR 2022), 2022, : 43 - 48
  • [5] KALA: Knowledge-Augmented Language Model Adaptation
    Kang, Minki
    Baek, Jinheon
    Hwang, Sung Ju
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5144 - 5167
  • [6] Construction contract risk identification based on knowledge-augmented language models
    Wong, Saika
    Zheng, Chunmo
    Su, Xing
    Tang, Yinqiu
    COMPUTERS IN INDUSTRY, 2024, 157
  • [7] CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks
    Li, Xiaoxi
    Dou, Zhicheng
    Zhou, Yujia
    Liu, Fangchao
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 26 - 37
  • [8] Thai Knowledge-Augmented Language Model Adaptation (ThaiKALA)
    Ruangchutiphophan, Pavaris
    Saetia, Chanatip
    Ayutthaya, Thititorn Seneewong Na
    Chalothorn, Tawunrat
    2023 18TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING, ISAI-NLP, 2023,
  • [9] A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning
    Chen, Jiangui
    Zhang, Ruqing
    Guo, Jiafeng
    de Rijke, Maarten
    Liu, Yiqun
    Fan, Yixing
    Cheng, Xueqi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1448 - 1457
  • [10] Knowledge-Augmented Visual Question Answering With Natural Language Explanation
    Xie, Jiayuan
    Cai, Yi
    Chen, Jiali
    Xu, Ruohang
    Wang, Jiexin
    Li, Qing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2652 - 2664