Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

被引：0

作者：

Kang, Minki ^{[1
,2
,5
]}

Lee, Seanie ^{[2
]}

Baek, Jinheon ^{[2
]}

Kawaguchi, Kenji ^{[3
]}

Hwang, Sung Ju ^{[2
,4
]}

机构：

[1] KRAFTON, Seongnam, South Korea

[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[3] Natl Univ Singapore, Singapore, Singapore

[4] DeepAuto Ai, Seoul, South Korea

[5] AITRICS, Seoul, South Korea

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

引用

页数：30

共 50 条

[21] Supporting knowledge-intensive inspection tasks with application ontologies
Koenderink, Nicole J. J. P.
Top, Jan L.
Van Vliet, Lucas J.
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2006, 64 (10) : 974 - 983
[22] SUPPORTING KNOWLEDGE-INTENSIVE CONSTRUCTION MANAGEMENT TASKS IN BIM
Nepal, Madhav P.
Staub-French, Sheryl
JOURNAL OF INFORMATION TECHNOLOGY IN CONSTRUCTION, 2016, 21 : 13 - 38
[23] Knowledge-Intensive Language Understanding for Explainable AI
Sheth, Amit
Gaur, Manas
Roy, Kaushik
Faldu, Keyur
IEEE INTERNET COMPUTING, 2021, 25 (05) : 19 - 24
[24] KNOWLEDGE-INTENSIVE NATURAL LANGUAGE GENERATION.
Jacobs, Paul S.
1600, (33):
[25] KNOWLEDGE-INTENSIVE NATURAL-LANGUAGE GENERATION
JACOBS, PS
ARTIFICIAL INTELLIGENCE, 1987, 33 (03) : 325 - 378
[26] CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks
Chen, Jiangui
Zhang, Ruqing
Guo, Jiafeng
Liu, Yiqun
Fan, Yixing
Cheng, Xueqi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 191 - 200
[27] From knowledge-intensive services to knowledge-intensive service systems
Miles, Ian
INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2011, 16 (02) : 141 - 159
[28] Reflections on knowledge and knowledge-intensive firms
Donaldson, L
HUMAN RELATIONS, 2001, 54 (07) : 955 - 963
[29] Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks
Asai, Akari
Gardner, Matt
Hajishirzi, Hannaneh
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2226 - 2243
[30] Contrastive knowledge-augmented self-distillation approach for few-shot learning
Zhang, Lixu
Shao, Mingwen
Chen, Sijie
Liu, Fukang
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)

← 1 2 3 4 5 →