Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引:2
|
作者
Qiu, Junyan [1 ]
Zhang, Haidong [2 ]
Yang, Yiping [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;
D O I
10.1080/09540091.2023.2174078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] Block change learning for knowledge distillation
    Choi, Hyunguk
    Lee, Younkwan
    Yow, Kin Choong
    Jeon, Moongu
    INFORMATION SCIENCES, 2020, 513 (513) : 360 - 371
  • [22] Skill enhancement learning with knowledge distillation
    Liu, Naijun
    Sun, Fuchun
    Fang, Bin
    Liu, Huaping
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (08)
  • [23] Continual Learning With Knowledge Distillation: A Survey
    Li, Songze
    Su, Tonghua
    Zhang, Xuyao
    Wang, Zhongjie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [24] KNOWLEDGE DISTILLATION FOR WIRELESS EDGE LEARNING
    Mohamed, Ahmed P.
    Fameel, Abu Shafin Mohammad Mandee
    El Gamal, Aly
    2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 600 - 604
  • [25] Noise as a Resource for Learning in Knowledge Distillation
    Arani, Elahe
    Sarfraz, Fahad
    Zonooz, Bahram
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3128 - 3137
  • [26] Learning Interpretation with Explainable Knowledge Distillation
    Alharbi, Raed
    Vu, Minh N.
    Thai, My T.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 705 - 714
  • [27] A Survey of Knowledge Distillation in Deep Learning
    Shao R.-R.
    Liu Y.-A.
    Zhang W.
    Wang J.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (08): : 1638 - 1673
  • [28] Skill enhancement learning with knowledge distillation
    Naijun LIU
    Fuchun SUN
    Bin FANG
    Huaping LIU
    Science China(Information Sciences), 2024, 67 (08) : 206 - 220
  • [29] BookKD: A novel knowledge distillation for reducing distillation costs by decoupling knowledge generation and learning
    Zhu, Songling
    Shang, Ronghua
    Tang, Ke
    Xu, Songhua
    Li, Yangyang
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [30] EFFECTIVE KNOWLEDGE DISTILLATION FOR HUMAN POSE ESTIMATION
    Zhou, Yang
    Gu, Xiaofeng
    Fu, Hong
    Li, Na
    Du, Xuemei
    Kuang, Ping
    2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 170 - 173