Reward Identification in Inverse Reinforcement Learning

被引:0
|
作者
Kim, Kuno [1 ]
Garg, Shivam [1 ]
Shiragur, Kirankumar [1 ]
Ermon, Stefano [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Palo Alto, CA 94304 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
DYNAMIC DISCRETE-CHOICE; MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [22] Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search
    Kurzer, Karl
    Bitzer, Matthias
    Zoellner, J. Marius
    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 22 - 28
  • [23] OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning
    Henderson, Peter
    Chang, Wei-Di
    Bacon, Pierre-Luc
    Meger, David
    Pineau, Joelle
    Precup, Doina
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3199 - 3206
  • [24] Information Directed Reward Learning for Reinforcement Learning
    Lindner, David
    Turchetta, Matteo
    Tschiatschek, Sebastian
    Ciosek, Kamil
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] Reinforcement learning reward functions for unsupervised learning
    Fyfe, Colin
    Lai, Pei Ling
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
  • [26] Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning
    Nakata Y.
    Arai S.
    Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (06)
  • [27] Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward
    Yusukc N.
    Sachiyo A.
    Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (01)
  • [28] Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation
    Liu, Jialin
    Su, Xinyan
    He, Zeyu
    Li, Jun
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1262 - 1267
  • [29] A critical state identification approach to inverse reinforcement learning for autonomous systems
    Hwang, Maxwell
    Jiang, Wei-Cheng
    Chen, Yu-Jen
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (05) : 1409 - 1423
  • [30] Identification method for collective consensus mechanism based on inverse reinforcement learning
    Yu X.
    Wu W.
    Luo J.
    Li W.
    Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2023, 53 (02): : 258 - 267