Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引：0

作者：

Sugiyama, Hiroaki ^{[1
]}

Meguro, Toyomi ^{[1
]}

Minami, Yasuhiro ^{[1
]}

机构：

[1] NTT Commun Sci Labs, Kyoto, Japan

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.

引用

页码：222 / 225

页数：4

共 50 条

[1] Preference Elicitation and Inverse Reinforcement Learning
Rothkopf, Constantin A.
Dimitrakakis, Christos
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 34 - 48
[2] Inverse Reinforcement Learning in Tracking Control Based on Inverse Optimal Control
Xue, Wenqian
Kolaric, Patrik
Fan, Jialu
Lian, Bosen
Chai, Tianyou
Lewis, Frank L.
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10570 - 10581
[3] Inverse Reinforcement Learning Based on Behaviors of a Learning Agent
Sakurai, Shunsuke
Oba, Shigeyuki
Ishii, Shin
NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 724 - 732
[4] Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
Cheng, Weiwei
Fuernkranz, Johannes
Huellermeier, Eyke
Park, Sang-Hyeun
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 312 - 327
[5] Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
Parthasarathy, Srinivas
Busso, Carlos
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 252 - 256
[6] Learning state importance for preference-based reinforcement learning
Zhang, Guoxi
Kashima, Hisashi
MACHINE LEARNING, 2023, 113 (4) : 1885 - 1901
[7] Learning state importance for preference-based reinforcement learning
Guoxi Zhang
Hisashi Kashima
Machine Learning, 2024, 113 : 1885 - 1901
[8] Inverse Reinforcement Learning: A Control Lyapunov Approach
Tesfazgi, Samuel
Lederer, Armin
Hirche, Sandra
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3627 - 3632
[9] Inverse Reinforcement Learning Based Stochastic Driver Behavior Learning
Ozkan, Mehmet F.
Rocque, Abishek J.
Ma, Yao
IFAC PAPERSONLINE, 2021, 54 (20): : 882 - 888
[10] Modeling Contingent Decision Behavior: A Bayesian Nonparametric Preference-Learning Approach
Liu, Jiapeng
Kadzinski, Milosz
Liaoa, Xiuwu
INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 764 - 785

← 1 2 3 4 5 →