Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引：0

作者：

Sugiyama, Hiroaki ^{[1
]}

Meguro, Toyomi ^{[1
]}

Minami, Yasuhiro ^{[1
]}

机构：

[1] NTT Commun Sci Labs, Kyoto, Japan

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.

引用

页码：222 / 225

页数：4

共 50 条

[41] Online Observer-Based Inverse Reinforcement Learning
Self, Ryan
Coleman, Kevin
Bai, He
Kamalapurkar, Rushikesh
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1959 - 1964
[42] Online Observer-Based Inverse Reinforcement Learning
Self, Ryan
Coleman, Kevin
Bai, He
Kamalapurkar, Rushikesh
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06): : 1922 - 1927
[43] A reinforcement learning approach for thermostat setpoint preference learning
Elehwany, Hussein
Ouf, Mohamed
Gunay, Burak
Cotrufo, Nunzio
Venne, Jean-Simon
BUILDING SIMULATION, 2024, 17 (01) : 131 - 146
[44] A reinforcement learning approach for thermostat setpoint preference learning
Hussein Elehwany
Mohamed Ouf
Burak Gunay
Nunzio Cotrufo
Jean-Simon Venne
Building Simulation, 2024, 17 : 131 - 146
[45] Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
Peysakhovich, Alexander
AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 409 - 415
[46] Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning
Lee, Keuntaek
Vlahov, Bogdan
Gibson, Jason
Rehg, James M.
Theodorou, Evangelos A.
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10793 - 10799
[47] Bayesian Inverse Reinforcement Learning-based Reward Learning for Automated Driving
Zeng, Di
Zheng, Ling
Li, Yinong
Yang, Xiantong
Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (10): : 245 - 260
[48] Inverse reinforcement learning control for trajectory tracking of a multirotor UAV
Seungwon Choi
Suseong Kim
H. Jin Kim
International Journal of Control, Automation and Systems, 2017, 15 : 1826 - 1834
[49] Inverse Reinforcement Learning Control for Trajectory Tracking of a Multirotor UAV
Choi, Seungwon
Kim, Suseong
Kim, H. Jin
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (04) : 1826 - 1834
[50] Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation
Wu, Yaxiong
Macdonald, Craig
Ounis, Iadh
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 241 - 251

← 1 2 3 4 5 →