Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引：2

作者：

Qiu, Junyan ^{[1
]}

Zhang, Haidong ^{[2
]}

Yang, Yiping ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

来源：

CONNECTION SCIENCE | 2023年 / 35卷 / 01期

关键词：

Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;

D O I：

10.1080/09540091.2023.2174078

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.

引用

页数：28

共 50 条

[21] Block change learning for knowledge distillation
Choi, Hyunguk
Lee, Younkwan
Yow, Kin Choong
Jeon, Moongu
INFORMATION SCIENCES, 2020, 513 (513) : 360 - 371
[22] Skill enhancement learning with knowledge distillation
Liu, Naijun
Sun, Fuchun
Fang, Bin
Liu, Huaping
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (08)
[23] Continual Learning With Knowledge Distillation: A Survey
Li, Songze
Su, Tonghua
Zhang, Xuyao
Wang, Zhongjie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[24] KNOWLEDGE DISTILLATION FOR WIRELESS EDGE LEARNING
Mohamed, Ahmed P.
Fameel, Abu Shafin Mohammad Mandee
El Gamal, Aly
2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 600 - 604
[25] Noise as a Resource for Learning in Knowledge Distillation
Arani, Elahe
Sarfraz, Fahad
Zonooz, Bahram
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3128 - 3137
[26] Learning Interpretation with Explainable Knowledge Distillation
Alharbi, Raed
Vu, Minh N.
Thai, My T.
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 705 - 714
[27] A Survey of Knowledge Distillation in Deep Learning
Shao R.-R.
Liu Y.-A.
Zhang W.
Wang J.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (08): : 1638 - 1673
[28] Skill enhancement learning with knowledge distillation
Naijun LIU
Fuchun SUN
Bin FANG
Huaping LIU
Science China(Information Sciences), 2024, 67 (08) : 206 - 220
[29] BookKD: A novel knowledge distillation for reducing distillation costs by decoupling knowledge generation and learning
Zhu, Songling
Shang, Ronghua
Tang, Ke
Xu, Songhua
Li, Yangyang
KNOWLEDGE-BASED SYSTEMS, 2023, 279
[30] EFFECTIVE KNOWLEDGE DISTILLATION FOR HUMAN POSE ESTIMATION
Zhou, Yang
Gu, Xiaofeng
Fu, Hong
Li, Na
Du, Xuemei
Kuang, Ping
2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 170 - 173

← 1 2 3 4 5 →