Preference-learning based Inverse Reinforcement Learning for Dialog Control

被引:0
|
作者
Sugiyama, Hiroaki [1 ]
Meguro, Toyomi [1 ]
Minami, Yasuhiro [1 ]
机构
[1] NTT Commun Sci Labs, Kyoto, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dialog systems that realize dialog control with reinforcement learning have recently been proposed. However, reinforcement learning has an open problem that it requires a reward function that is difficult to set appropriately. To set the appropriate reward function automatically, we propose preference-learning based inverse reinforcement learning (PIRL) that estimates a reward function from dialog sequences and their pairwise-preferences, which is calculated with annotated ratings to the sequences. Inverse reinforcement learning finds a reward function, with which a system generates similar sequences to the training ones. This indicates that current IRL supposes that the sequences are equally appropriate for a given task; thus, it cannot utilize the ratings. In contrast, our PIRL can utilize pairwise preferences of the ratings to estimate the reward function. We examine the advantages of PIRL through comparisons between competitive algorithms that have been widely used to realize the dialog control. Our experiments show that our PIRL outperforms the other algorithms and has a potential to be an evaluation simulator of dialog control.
引用
收藏
页码:222 / 225
页数:4
相关论文
共 50 条
  • [41] Online Observer-Based Inverse Reinforcement Learning
    Self, Ryan
    Coleman, Kevin
    Bai, He
    Kamalapurkar, Rushikesh
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1959 - 1964
  • [42] Online Observer-Based Inverse Reinforcement Learning
    Self, Ryan
    Coleman, Kevin
    Bai, He
    Kamalapurkar, Rushikesh
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (06): : 1922 - 1927
  • [43] A reinforcement learning approach for thermostat setpoint preference learning
    Elehwany, Hussein
    Ouf, Mohamed
    Gunay, Burak
    Cotrufo, Nunzio
    Venne, Jean-Simon
    BUILDING SIMULATION, 2024, 17 (01) : 131 - 146
  • [44] A reinforcement learning approach for thermostat setpoint preference learning
    Hussein Elehwany
    Mohamed Ouf
    Burak Gunay
    Nunzio Cotrufo
    Jean-Simon Venne
    Building Simulation, 2024, 17 : 131 - 146
  • [45] Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
    Peysakhovich, Alexander
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 409 - 415
  • [46] Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning
    Lee, Keuntaek
    Vlahov, Bogdan
    Gibson, Jason
    Rehg, James M.
    Theodorou, Evangelos A.
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10793 - 10799
  • [47] Bayesian Inverse Reinforcement Learning-based Reward Learning for Automated Driving
    Zeng, Di
    Zheng, Ling
    Li, Yinong
    Yang, Xiantong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (10): : 245 - 260
  • [48] Inverse reinforcement learning control for trajectory tracking of a multirotor UAV
    Seungwon Choi
    Suseong Kim
    H. Jin Kim
    International Journal of Control, Automation and Systems, 2017, 15 : 1826 - 1834
  • [49] Inverse Reinforcement Learning Control for Trajectory Tracking of a Multirotor UAV
    Choi, Seungwon
    Kim, Suseong
    Kim, H. Jin
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (04) : 1826 - 1834
  • [50] Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation
    Wu, Yaxiong
    Macdonald, Craig
    Ounis, Iadh
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 241 - 251