Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引:0
|
作者
Sugiyama, Hiroaki [1 ]
Meguro, Toyomi [1 ]
Minami, Yasuhiro [1 ]
机构
[1] NTT Commun Sci Labs, Kyoto, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
引用
收藏
页码:222 / 225
页数:4
相关论文
共 50 条
  • [1] Preference Elicitation and Inverse Reinforcement Learning
    Rothkopf, Constantin A.
    Dimitrakakis, Christos
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 34 - 48
  • [2] Inverse Reinforcement Learning in Tracking Control Based on Inverse Optimal Control
    Xue, Wenqian
    Kolaric, Patrik
    Fan, Jialu
    Lian, Bosen
    Chai, Tianyou
    Lewis, Frank L.
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10570 - 10581
  • [3] Inverse Reinforcement Learning Based on Behaviors of a Learning Agent
    Sakurai, Shunsuke
    Oba, Shigeyuki
    Ishii, Shin
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 724 - 732
  • [4] Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
    Cheng, Weiwei
    Fuernkranz, Johannes
    Huellermeier, Eyke
    Park, Sang-Hyeun
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 312 - 327
  • [5] Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
    Parthasarathy, Srinivas
    Busso, Carlos
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 252 - 256
  • [6] Learning state importance for preference-based reinforcement learning
    Zhang, Guoxi
    Kashima, Hisashi
    MACHINE LEARNING, 2023, 113 (4) : 1885 - 1901
  • [7] Learning state importance for preference-based reinforcement learning
    Guoxi Zhang
    Hisashi Kashima
    Machine Learning, 2024, 113 : 1885 - 1901
  • [8] Inverse Reinforcement Learning: A Control Lyapunov Approach
    Tesfazgi, Samuel
    Lederer, Armin
    Hirche, Sandra
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3627 - 3632
  • [9] Inverse Reinforcement Learning Based Stochastic Driver Behavior Learning
    Ozkan, Mehmet F.
    Rocque, Abishek J.
    Ma, Yao
    IFAC PAPERSONLINE, 2021, 54 (20): : 882 - 888
  • [10] Modeling Contingent Decision Behavior: A Bayesian Nonparametric Preference-Learning Approach
    Liu, Jiapeng
    Kadzinski, Milosz
    Liaoa, Xiuwu
    INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 764 - 785