Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引：2

作者：

Qiu, Junyan ^{[1
]}

Zhang, Haidong ^{[2
]}

Yang, Yiping ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

来源：

CONNECTION SCIENCE | 2023年 / 35卷 / 01期

关键词：

Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;

D O I：

10.1080/09540091.2023.2174078

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.

引用

页数：28

共 50 条

[41] Subgoal Discovery for Hierarchical Dialogue Policy Learning
Tang, Da
Li, Xiujun
Gao, Jianfeng
Wang, Chong
Li, Lihong
Jebara, Tony
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2298 - 2309
[42] Personalized Decentralized Federated Learning with Knowledge Distillation
Jeong, Eunjeong
Kountouris, Marios
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1982 - 1987
[43] Heterogeneous Knowledge Distillation Using Conceptual Learning
Yu, Yerin
Kim, Namgyu
IEEE ACCESS, 2024, 12 : 52803 - 52814
[44] Originary Culture and Knowledge Dialogue in the Learning of Mapuzungun
Turra Diaz, Omar
Moreno Rubio, Lila
Huenupi Marile, Jose
PORTA LINGUARUM, 2015, (24) : 107 - 120
[45] A Knowledge Driven Dialogue Model With Reinforcement Learning
Jia, Yongnan
Min, Gaochen
Xu, Cong
Li, Xisheng
Zhang, Dezheng
IEEE ACCESS, 2020, 8 : 131741 - 131749
[46] Evaluation of Online Dialogue Policy Learning Techniques
Papangelis, Alexandros
Karkaletsis, Vangelis
Makedon, Fillia
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1410 - 1415
[47] Boosting Contrastive Learning with Relation Knowledge Distillation
Zheng, Kai
Wang, Yuanjiang
Yuan, Ye
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3508 - 3516
[48] Multimodal Learning with Incomplete Modalities by Knowledge Distillation
Wang, Qi
Zhan, Liang
Thompson, Paul
Zhou, Jiayu
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1828 - 1838
[49] WHEN FEDERATED LEARNING MEETS KNOWLEDGE DISTILLATION
Pang, Xiaoyi
Hu, Jiahui
Sun, Peng
Ren, Ju
Wang, Zhibo
IEEE WIRELESS COMMUNICATIONS, 2024, 31 (05) : 208 - 214
[50] Knowledge distillation in deep learning and its applications
Alkhulaifi, Abdolmaged
Alsahli, Fahad
Ahmad, Irfan
PEERJ COMPUTER SCIENCE, 2021, PeerJ Inc. (07) : 1 - 24

← 1 2 3 4 5 →