Probabilistic Inference in Reinforcement Learning Done Right

被引:0
|
作者
Tarbouriech, Jean [1 ]
Lattimore, Tor [1 ]
O'Donoghue, Brendan [1 ]
机构
[1] Google DeepMind, London, England
关键词
ENTROPY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] Probabilistic inference for determining options in reinforcement learning
    Daniel, Christian
    van Hoof, Herke
    Peters, Jan
    Neumann, Gerhard
    MACHINE LEARNING, 2016, 104 (2-3) : 337 - 357
  • [2] Probabilistic inference for determining options in reinforcement learning
    Christian Daniel
    Herke van Hoof
    Jan Peters
    Gerhard Neumann
    Machine Learning, 2016, 104 : 337 - 357
  • [3] Understanding Reinforcement Learning Based Localisation as a Probabilistic Inference Algorithm
    Yamagata, Taku
    Santos-Rodriguez, Raul
    Piechocki, Robert
    Flack, Peter
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 111 - 122
  • [4] Tutorial and Survey on Probabilistic Graphical Model and Variational Inference in Deep Reinforcement Learning
    Sun, Xudong
    Bischl, Bernd
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 110 - 119
  • [5] Probabilistic learning and inference in schizophrenia
    Averbeck, Bruno B.
    Evans, Simon
    Chouhan, Viraj
    Bristow, Eleanor
    Shergill, Sukhwinder S.
    SCHIZOPHRENIA RESEARCH, 2011, 127 (1-3) : 115 - 122
  • [6] Reinforcement Learning or Active Inference?
    Friston, Karl J.
    Daunizeau, Jean
    Kiebel, Stefan J.
    PLOS ONE, 2009, 4 (07):
  • [7] Probabilistic reinforcement precludes transitive inference: A preliminary study
    Camarena, Hector O.
    Garcia-Leal, Oscar
    Delgadillo-Orozco, Julieta
    Barron, Erick
    FRONTIERS IN PSYCHOLOGY, 2023, 14
  • [8] Transitive inference as probabilistic preference learning
    Mannella, Francesco
    Pezzulo, Giovanni
    PSYCHONOMIC BULLETIN & REVIEW, 2024, : 674 - 689
  • [9] Inference and learning in hybrid probabilistic network
    Wang L.
    Wang X.
    Li X.
    Front. Comput. Sci. China, 2007, 4 (429-435): : 429 - 435
  • [10] Probabilistic Inference for Fast Learning in Control
    Rasmussen, Carl Edward
    Deisenroth, Marc Peter
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 229 - 242