Reward Identification in Inverse Reinforcement Learning

被引：0

作者：

Kim, Kuno ^{[1
]}

Garg, Shivam ^{[1
]}

Shiragur, Kirankumar ^{[1
]}

Ermon, Stefano ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Palo Alto, CA 94304 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

DYNAMIC DISCRETE-CHOICE; MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.

引用

页数：10

共 50 条

[21] Reward, motivation, and reinforcement learning
Dayan, P
Balleine, BW
NEURON, 2002, 36 (02) : 285 - 298
[22] Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search
Kurzer, Karl
Bitzer, Matthias
Zoellner, J. Marius
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 22 - 28
[23] OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning
Henderson, Peter
Chang, Wei-Di
Bacon, Pierre-Luc
Meger, David
Pineau, Joelle
Precup, Doina
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3199 - 3206
[24] Information Directed Reward Learning for Reinforcement Learning
Lindner, David
Turchetta, Matteo
Tschiatschek, Sebastian
Ciosek, Kamil
Krause, Andreas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[25] Reinforcement learning reward functions for unsupervised learning
Fyfe, Colin
Lai, Pei Ling
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
[26] Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning
Nakata Y.
Arai S.
Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (06)
[27] Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward
Yusukc N.
Sachiyo A.
Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (01)
[28] Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation
Liu, Jialin
Su, Xinyan
He, Zeyu
Li, Jun
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1262 - 1267
[29] A critical state identification approach to inverse reinforcement learning for autonomous systems
Hwang, Maxwell
Jiang, Wei-Cheng
Chen, Yu-Jen
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (05) : 1409 - 1423
[30] Identification method for collective consensus mechanism based on inverse reinforcement learning
Yu X.
Wu W.
Luo J.
Li W.
Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2023, 53 (02): : 258 - 267

← 1 2 3 4 5 →