Bayesian Learning of Noisy Markov Decision Processes

被引:4
|
作者
Singh, Sumeetpal S. [1 ]
Chopin, Nicolas [2 ,3 ]
Whiteley, Nick [4 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] CREST ENSAE, Paris, France
[3] HEC Paris, Paris, France
[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England
关键词
Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;
D O I
10.1145/2414416.2414420
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Markov decision processes
    White, D.J.
    Journal of the Operational Research Society, 1995, 46 (06):
  • [42] Markov Decision Processes
    Bäuerle N.
    Rieder U.
    Jahresbericht der Deutschen Mathematiker-Vereinigung, 2010, 112 (4) : 217 - 243
  • [43] Noisy Bayesian Active Learning
    Naghshvar, Mohammad
    Javidi, Tara
    Chaudhuri, Kamalika
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 1626 - 1633
  • [44] Information-directed policy sampling for episodic Bayesian Markov decision processes
    Diaz, Victoria
    Ghate, Archis
    IISE TRANSACTIONS, 2024,
  • [45] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [46] Counterexample Explanation by Learning Small Strategies in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Fellner, Andreas
    Kretinsky, Jan
    COMPUTER AIDED VERIFICATION, PT I, 2015, 9206 : 158 - 177
  • [47] REINFORCEMENT LEARNING OF NON-MARKOV DECISION-PROCESSES
    WHITEHEAD, SD
    LIN, LJ
    ARTIFICIAL INTELLIGENCE, 1995, 73 (1-2) : 271 - 306
  • [48] Learning in Non-Cooperative Configurable Markov Decision Processes
    Ramponi, Giorgia
    Metelli, Alberto Maria
    Concetti, Alessandro
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [49] L*-based learning of Markov decision processes (extended version)
    Tappler, Martin
    Aichernig, Bernhard K.
    Bacci, Giovanni
    Eichlseder, Maria
    Larsen, Kim G.
    FORMAL ASPECTS OF COMPUTING, 2021, 33 (4-5) : 575 - 615
  • [50] Permissive Supervisor Synthesis for Markov Decision Processes Through Learning
    Wu, Bo
    Zhang, Xiaobin
    Lin, Hai
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (08) : 3332 - 3338