Bayesian Learning of Noisy Markov Decision Processes

被引：4

作者：

Singh, Sumeetpal S. ^{[1
]}

Chopin, Nicolas ^{[2
,3
]}

Whiteley, Nick ^{[4
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

[2] CREST ENSAE, Paris, France

[3] HEC Paris, Paris, France

[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England

来源：

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2013年 / 23卷 / 01期

关键词：

Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;

D O I：

10.1145/2414416.2414420

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

引用

页数：25

共 50 条

[31] Concurrent Markov decision processes for robot team learning
Girard, Justin
Emami, M. Reza
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
[32] Learning Adversarial Markov Decision Processes with Delayed Feedback
Lancewicki, Tal
Rosenberg, Aviv
Mansour, Yishay
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289
[33] A reinforcement learning based algorithm for Markov decision processes
Bhatnagar, S
Kumar, S
2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
[34] Verification of Markov Decision Processes Using Learning Algorithms
Brazdil, Tomas
Chatterjee, Krishnendu
Chmelik, Martin
Forejt, Vojtech
Kretinsky, Jan
Kwiatkowska, Marta
Parker, David
Ujma, Mateusz
AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
[35] Recursive learning automata approach to Markov decision processes
Chang, Hyeong Soo
Fu, Michael C.
Hu, Jiaqiao
Marcus, Steven I.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (07) : 1349 - 1355
[36] Learning and Planning with Timing Information in Markov Decision Processes
Bacon, Pierre-Luc
Balle, Borja
Precup, Doina
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 111 - 120
[37] Learning algorithms or Markov decision processes with average cost
Abounadi, J
Bertsekas, D
Borkar, VS
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
[38] Combining Learning Algorithms: An Approach to Markov Decision Processes
Ribeiro, Richardson
Favarim, Fabio
Barbosa, Marco A. C.
Koerich, Alessandro L.
Enembreck, Fabricio
ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
[39] A sensitivity view of Markov decision processes and reinforcement learning
Cao, XR
MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
[40] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316

← 1 2 3 4 5 →