Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

被引:0
|
作者
Li, Ziming [1 ]
Kiseleva, Julia [1 ,2 ]
de Rijke, Maarten [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
[2] Microsoft Res AI, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
引用
收藏
页码:6722 / 6729
页数:8
相关论文
共 50 条
  • [31] Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
    Kim, Woo Kyung
    Yoo, Minjong
    Woo, Honguk
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4300 - 4307
  • [32] Reinforcement learning building control approach harnessing imitation learning
    Dey, Sourav
    Marzullo, Thibault
    Zhang, Xiangyu
    Henze, Gregor
    ENERGY AND AI, 2023, 14
  • [33] Tracking the Race Between Deep Reinforcement Learning and Imitation Learning
    Gros, Timo P.
    Hoeller, Daniel
    Hoffmann, Joerg
    Wolf, Verena
    QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2020), 2020, 12289 : 11 - 17
  • [34] Integration of Evolutionary Computing and Reinforcement Learning for Robotic Imitation Learning
    Tan, Huan
    Balajee, Kannan
    Lynn, DeRose
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 407 - 412
  • [35] Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
    Rashidinejad, Paria
    Zhu, Banghua
    Ma, Cong
    Jiao, Jiantao
    Russell, Stuart
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning
    Wang X.
    Gu K.
    Yuhang Xuebao/Journal of Astronautics, 2023, 44 (06): : 914 - 925
  • [37] Learning How to Play Bomberman with Deep Reinforcement and Imitation Learning
    Goulart, Icaro
    Paes, Aline
    Clua, Esteban
    ENTERTAINMENT COMPUTING AND SERIOUS GAMES, ICEC-JCSG 2019, 2019, 11863 : 121 - 133
  • [38] Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning
    Guo, Wenxia
    Tian, Wenhong
    Ye, Yufei
    Xu, Lingxiao
    Wu, Kui
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05): : 3576 - 3586
  • [39] Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
    Rashidinejad, Paria
    Zhu, Banghua
    Ma, Cong
    Jiao, Jiantao
    Russell, Stuart
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) : 8156 - 8196
  • [40] Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
    Yang, Chao
    Ma, Xiaojian
    Huang, Wenbing
    Sun, Fuchun
    Liu, Huaping
    Huang, Junzhou
    Gan, Chuang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32