Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引:0
|
作者
Sugiyama, Hiroaki [1 ]
Meguro, Toyomi [1 ]
Minami, Yasuhiro [1 ]
机构
[1] NTT Commun Sci Labs, Kyoto, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
引用
收藏
页码:222 / 225
页数:4
相关论文
共 50 条
  • [21] Stroke-Based Stylization Learning and Rendering with Inverse Reinforcement Learning
    Xie, Ning
    Zhao, Tingting
    Tian, Feng
    Zhang, Xiaohua
    Sugiyama, Masashi
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 2531 - 2537
  • [22] A novel Congestion Control algorithm based on inverse reinforcement learning with parallel training
    Luo, Pengcheng
    Liu, Yuan
    Wang, Zekun
    Chu, Jian
    Yang, Genke
    COMPUTER NETWORKS, 2023, 237
  • [23] Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
    Das, Abhishek
    Kottur, Satwik
    Moura, Jose M. F.
    Lee, Stefan
    Batra, Dhruv
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2970 - 2979
  • [24] Learning Behavior Styles with Inverse Reinforcement Learning
    Lee, Seong Jae
    popovic, Zoran
    ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
  • [25] A Survey of Preference-Based Reinforcement Learning Methods
    Wirth, Christian
    Akrour, Riad
    Neumann, Gerhard
    Fuernkranz, Johannes
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [26] A survey of preference-based reinforcement learning methods
    1600, Microtome Publishing (18):
  • [27] Learning variable impedance control based on reinforcement learning
    Li C.
    Zhang Z.
    Xia G.
    Xie X.
    Zhu Q.
    Liu Q.
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2019, 40 (02): : 304 - 311
  • [28] Preference-Learning Emitters for Mixed-Initiative Quality-Diversity Algorithms
    Gallotta, Roberto
    Arulkumaran, Kai
    Soros, L. B.
    IEEE TRANSACTIONS ON GAMES, 2024, 16 (02) : 303 - 316
  • [29] Repeated Inverse Reinforcement Learning
    Amin, Kareem
    Jiang, Nan
    Singh, Satinder
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] Cooperative Inverse Reinforcement Learning
    Hadfield-Menell, Dylan
    Dragan, Anca
    Abbeel, Pieter
    Russell, Stuart
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29