Bayesian Learning of Noisy Markov Decision Processes

被引：4

作者：

Singh, Sumeetpal S. ^{[1
]}

Chopin, Nicolas ^{[2
,3
]}

Whiteley, Nick ^{[4
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

[2] CREST ENSAE, Paris, France

[3] HEC Paris, Paris, France

[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England

来源：

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2013年 / 23卷 / 01期

关键词：

Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;

D O I：

10.1145/2414416.2414420

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

引用

页数：25

共 50 条

[41] Markov decision processes
White, D.J.
Journal of the Operational Research Society, 1995, 46 (06):
[42] Markov Decision Processes
Bäuerle N.
Rieder U.
Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243
[43] Noisy Bayesian Active Learning
Naghshvar, Mohammad
Javidi, Tara
Chaudhuri, Kamalika
2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 1626 - 1633
[44] Information-directed policy sampling for episodic Bayesian Markov decision processes
Diaz, Victoria
Ghate, Archis
IISE TRANSACTIONS, 2024,
[45] Learning deterministic policies in partially observable Markov decision processes
Miyazaki, K
Kobayashi, S
INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
[46] Counterexample Explanation by Learning Small Strategies in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Chmelik, Martin
Fellner, Andreas
Kretinsky, Jan
COMPUTER AIDED VERIFICATION, PT I, 2015, 9206 : 158 - 177
[47] REINFORCEMENT LEARNING OF NON-MARKOV DECISION-PROCESSES
WHITEHEAD, SD
LIN, LJ
ARTIFICIAL INTELLIGENCE, 1995, 73 (1-2) : 271 - 306
[48] Learning in Non-Cooperative Configurable Markov Decision Processes
Ramponi, Giorgia
Metelli, Alberto Maria
Concetti, Alessandro
Restelli, Marcello
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[49] L*-based learning of Markov decision processes (extended version)
Tappler, Martin
Aichernig, Bernhard K.
Bacci, Giovanni
Eichlseder, Maria
Larsen, Kim G.
FORMAL ASPECTS OF COMPUTING, 2021, 33 (4-5) : 575 - 615
[50] Permissive Supervisor Synthesis for Markov Decision Processes Through Learning
Wu, Bo
Zhang, Xiaobin
Lin, Hai
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (08) : 3332 - 3338

← 1 2 3 4 5 →