Probabilistic Inference in Reinforcement Learning Done Right

被引:0
|
作者
Tarbouriech, Jean [1 ]
Lattimore, Tor [1 ]
O'Donoghue, Brendan [1 ]
机构
[1] Google DeepMind, London, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
ENTROPY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
引用
收藏
页数:39
相关论文
共 50 条
  • [21] Guiding inference through relational reinforcement learning
    Asgharbeygi, N
    Nejati, N
    Langley, P
    Arai, S
    INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS, 2005, 3625 : 20 - 37
  • [22] VIREL: A Variational Inference Framework for Reinforcement Learning
    Fellows, Matthew
    Mahajan, Anuj
    Rudner, Tim G. J.
    Whiteson, Shimon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [23] Deep reinforcement learning with significant multiplications inference
    Ivanov, Dmitry A.
    Larionov, Denis A.
    Kiselev, Mikhail V.
    Dylov, Dmitry V.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [24] Deep reinforcement learning with significant multiplications inference
    Dmitry A. Ivanov
    Denis A. Larionov
    Mikhail V. Kiselev
    Dmitry V. Dylov
    Scientific Reports, 13
  • [25] Fuzzy inference system learning by reinforcement methods
    Jouffe, L
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1998, 28 (03): : 338 - 355
  • [26] Symbolic Task Inference in Deep Reinforcement Learning
    Hasanbeig, Hosein
    Jeppu, Natasha Yogananda
    Abate, Alessandro
    Melham, Tom
    Kroening, Daniel
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 1099 - 1137
  • [27] Online reinforcement learning control by Bayesian inference
    Xia, Zhongpu
    Zhao, Dongbin
    IET CONTROL THEORY AND APPLICATIONS, 2016, 10 (12): : 1331 - 1338
  • [28] Probabilistic Counterexample Guidance for Safer Reinforcement Learning
    Ji, Xiaotong
    Filieri, Antonio
    QUANTITATIVE EVALUATION OF SYSTEMS, QEST 2023, 2023, 14287 : 311 - 328
  • [29] Probabilistic Guarantees for Safe Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 : 231 - 248
  • [30] Testing probabilistic equivalence through reinforcement learning
    Desharnais, Josee
    Laviolette, Francois
    Zhioua, Sami
    FSTTCS 2006: FOUNDATIONS OF SOFTWARE TECHNOLOGY AND THEORETICAL COMPUTER SCIENCE, PROCEEDINGS, 2006, 4337 : 236 - +