Bayesian Learning of Noisy Markov Decision Processes

被引:4
|
作者
Singh, Sumeetpal S. [1 ]
Chopin, Nicolas [2 ,3 ]
Whiteley, Nick [4 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] CREST ENSAE, Paris, France
[3] HEC Paris, Paris, France
[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England
关键词
Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;
D O I
10.1145/2414416.2414420
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] An Argument for the Bayesian Control of Partially Observable Markov Decision Processes
    Vargo, Erik
    Cogill, Randy
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (10) : 2796 - 2800
  • [22] Using Linear Programming for Bayesian Exploration in Markov Decision Processes
    Castro, Pablo Samuel
    Precup, Doina
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2436 - 2441
  • [23] A Bayesian Network Approach to Control of Networked Markov Decision Processes
    Adlakha, Sachin
    Lall, Sanjay
    Goldsmith, Andrea
    2008 46TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1-3, 2008, : 446 - +
  • [24] Bayesian Nonparametric Methods for Learning Markov Switching Processes
    Fox, Emily B.
    Sudderth, Erik B.
    Jordan, Michael I.
    Willsky, Alan S.
    IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) : 43 - 54
  • [25] A Bayesian technique for task localization in Multiple Goal Markov Decision Processes
    Carroll, JL
    Seppi, K
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA'04), 2004, : 49 - 56
  • [26] Online Learning of Safety function for Markov Decision Processes
    Mazumdar, Abhijit
    Wisniewski, Rafal
    Bujorianu, Manuela L.
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [27] Learning Policies for Markov Decision Processes in Continuous Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
  • [28] Active Learning of Markov Decision Processes for System Verification
    Chen, Yingke
    Nielsen, Thomas Dyhre
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 289 - 294
  • [29] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [30] Learning Policies for Markov Decision Processes From Data
    Hanawal, Manjesh Kumar
    Liu, Hao
    Zhu, Henghui
    Paschalidis, Ioannis Ch.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309