Bayesian Learning of Noisy Markov Decision Processes

被引:4
|
作者
Singh, Sumeetpal S. [1 ]
Chopin, Nicolas [2 ,3 ]
Whiteley, Nick [4 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] CREST ENSAE, Paris, France
[3] HEC Paris, Paris, France
[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England
关键词
Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;
D O I
10.1145/2414416.2414420
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Concurrent Markov decision processes for robot team learning
    Girard, Justin
    Emami, M. Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
  • [32] Learning Adversarial Markov Decision Processes with Delayed Feedback
    Lancewicki, Tal
    Rosenberg, Aviv
    Mansour, Yishay
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289
  • [33] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [34] Verification of Markov Decision Processes Using Learning Algorithms
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Forejt, Vojtech
    Kretinsky, Jan
    Kwiatkowska, Marta
    Parker, David
    Ujma, Mateusz
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
  • [35] Recursive learning automata approach to Markov decision processes
    Chang, Hyeong Soo
    Fu, Michael C.
    Hu, Jiaqiao
    Marcus, Steven I.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (07) : 1349 - 1355
  • [36] Learning and Planning with Timing Information in Markov Decision Processes
    Bacon, Pierre-Luc
    Balle, Borja
    Precup, Doina
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 111 - 120
  • [37] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [38] Combining Learning Algorithms: An Approach to Markov Decision Processes
    Ribeiro, Richardson
    Favarim, Fabio
    Barbosa, Marco A. C.
    Koerich, Alessandro L.
    Enembreck, Fabricio
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
  • [39] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [40] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316