Probabilistic Inference in Reinforcement Learning Done Right

被引:0
|
作者
Tarbouriech, Jean [1 ]
Lattimore, Tor [1 ]
O'Donoghue, Brendan [1 ]
机构
[1] Google DeepMind, London, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
ENTROPY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] Reinforcement and Weakening of Stroop Effect in Probabilistic Learning
    Utochkin, I. S.
    Bolshakova, K. G.
    PSYCHOLOGY-JOURNAL OF THE HIGHER SCHOOL OF ECONOMICS, 2010, 7 (03): : 139 - +
  • [32] PROBABILISTIC REINFORCEMENT LEARNING IN INDIVIDUALS AT RISK FOR PSYCHOSIS
    Murray, Graham Keith
    Mukkala, S.
    Barnett, J.
    Jaaskelainen, E.
    Maki, P.
    Moilanen, I.
    Miettunen, J.
    Jones, P. B.
    Veijola, J.
    SCHIZOPHRENIA BULLETIN, 2011, 37 : 222 - 222
  • [33] PROBABILISTIC DISCRIMINATION LEARNING OF A SEQUENTIAL REINFORCEMENT PATTERN
    BUGGIE, SE
    PSYCHONOMIC SCIENCE, 1969, 15 (06): : 309 - &
  • [34] Probabilistic reinforcement learning and sleep: A pilot study
    Breslin, J. H.
    Frank, M. J.
    Bootzin, R. R.
    Finley, S. R.
    Nadel, L.
    SLEEP, 2008, 31 : A378 - A378
  • [35] Impairments in probabilistic learning in schizophrenia: The role of reinforcement
    Waltz, JA
    Frank, MJ
    Robinson, B
    Gold, JM
    BIOLOGICAL PSYCHIATRY, 2006, 59 (08) : 113S - 113S
  • [36] Probabilistic Policy Reuse for Safe Reinforcement Learning
    Garcia, Javier
    Fernandez, Fernando
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2019, 13 (03)
  • [37] Verified Probabilistic Policies for Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212
  • [38] PROBABILISTIC REINFORCEMENT LEARNING IN SCHIZOPHRENIA: RELATIONSHIP TO AMOTIVATION
    Dowd, Erin Connor
    Barch, Deanna Marie
    SCHIZOPHRENIA BULLETIN, 2011, 37 : 135 - 135
  • [39] Testing probabilistic equivalence through Reinforcement Learning
    Desharnais, Josee
    Laviolette, Francois
    Zhioua, Sami
    INFORMATION AND COMPUTATION, 2013, 227 : 21 - 57
  • [40] Importance-Weighted Offline Learning Done Right
    Gabbianelli, Germano
    Neu, Gergely
    Papini, Matteo
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237