Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引：0

作者：

Sugiyama, Hiroaki ^{[1
]}

Meguro, Toyomi ^{[1
]}

Minami, Yasuhiro ^{[1
]}

机构：

[1] NTT Commun Sci Labs, Kyoto, Japan

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.

引用

页码：222 / 225

页数：4

共 50 条

[31] Misspecification in Inverse Reinforcement Learning
Skalse, Joar
Abate, Alessandro
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15136 - 15143
[32] Lifelong Inverse Reinforcement Learning
Mendez, Jorge A.
Shivkumar, Shashank
Eaton, Eric
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[33] Bayesian Inverse Reinforcement Learning
Ramachandran, Deepak
Amir, Eyal
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2586 - 2591
[34] Inverse Constrained Reinforcement Learning
Malik, Shehryar
Anwar, Usman
Aghasi, Alireza
Ahmed, Ali
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[35] Inverse reinforcement learning with evaluation
da Silva, Valdinei Freire
Reali Costa, Anna Helena
Lima, Pedro
2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, : 4246 - +
[36] Identifiability in inverse reinforcement learning
Cao, Haoyang
Cohen, Samuel N.
Szpruch, Lukasz
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[37] A survey of inverse reinforcement learning
Adams, Stephen
Cody, Tyler
Beling, Peter A.
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (06) : 4307 - 4346
[38] Survey on Inverse Reinforcement Learning
Zhang L.-H.
Liu Q.
Huang Z.-G.
Zhu F.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10): : 4772 - 4803
[39] A survey of inverse reinforcement learning
Stephen Adams
Tyler Cody
Peter A. Beling
Artificial Intelligence Review, 2022, 55 : 4307 - 4346
[40] A behavior fusion method based on inverse reinforcement learning
Shi, Haobin
Li, Jingchen
Chen, Shicong
Hwang, Kao-Shing
INFORMATION SCIENCES, 2022, 609 : 429 - 444

← 1 2 3 4 5 →