Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

被引:0
|
作者
Kang, Minki [1 ,2 ,5 ]
Lee, Seanie [2 ]
Baek, Jinheon [2 ]
Kawaguchi, Kenji [3 ]
Hwang, Sung Ju [2 ,4 ]
机构
[1] KRAFTON, Seongnam, South Korea
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[3] Natl Univ Singapore, Singapore, Singapore
[4] DeepAuto Ai, Seoul, South Korea
[5] AITRICS, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Supporting knowledge-intensive inspection tasks with application ontologies
    Koenderink, Nicole J. J. P.
    Top, Jan L.
    Van Vliet, Lucas J.
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2006, 64 (10) : 974 - 983
  • [22] SUPPORTING KNOWLEDGE-INTENSIVE CONSTRUCTION MANAGEMENT TASKS IN BIM
    Nepal, Madhav P.
    Staub-French, Sheryl
    JOURNAL OF INFORMATION TECHNOLOGY IN CONSTRUCTION, 2016, 21 : 13 - 38
  • [23] Knowledge-Intensive Language Understanding for Explainable AI
    Sheth, Amit
    Gaur, Manas
    Roy, Kaushik
    Faldu, Keyur
    IEEE INTERNET COMPUTING, 2021, 25 (05) : 19 - 24
  • [24] KNOWLEDGE-INTENSIVE NATURAL LANGUAGE GENERATION.
    Jacobs, Paul S.
    1600, (33):
  • [25] KNOWLEDGE-INTENSIVE NATURAL-LANGUAGE GENERATION
    JACOBS, PS
    ARTIFICIAL INTELLIGENCE, 1987, 33 (03) : 325 - 378
  • [26] CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks
    Chen, Jiangui
    Zhang, Ruqing
    Guo, Jiafeng
    Liu, Yiqun
    Fan, Yixing
    Cheng, Xueqi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 191 - 200
  • [27] From knowledge-intensive services to knowledge-intensive service systems
    Miles, Ian
    INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2011, 16 (02) : 141 - 159
  • [28] Reflections on knowledge and knowledge-intensive firms
    Donaldson, L
    HUMAN RELATIONS, 2001, 54 (07) : 955 - 963
  • [29] Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks
    Asai, Akari
    Gardner, Matt
    Hajishirzi, Hannaneh
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2226 - 2243
  • [30] Contrastive knowledge-augmented self-distillation approach for few-shot learning
    Zhang, Lixu
    Shao, Mingwen
    Chen, Sijie
    Liu, Fukang
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)