Probabilistic Inference in Reinforcement Learning Done Right

被引:0
|
作者
Tarbouriech, Jean [1 ]
Lattimore, Tor [1 ]
O'Donoghue, Brendan [1 ]
机构
[1] Google DeepMind, London, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
ENTROPY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
引用
收藏
页数:39
相关论文
共 50 条
  • [41] Reinforcement learning in a probabilistic learning task without time constraints
    Jablonska, Judyta
    Szumiec, Lukasz
    Parkitna, Jan Rodriguez
    PHARMACOLOGICAL REPORTS, 2019, 71 (06) : 1310 - 1310
  • [42] Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning
    Morato, P. G.
    Andriotis, C. P.
    Papakonstantinou, K. G.
    Rigo, P.
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2023, 235
  • [43] Done right
    Andreas Trabesinger
    Nature Physics, 2014, 10 (9) : 619 - 619
  • [44] Enhanced Probabilistic Inference Algorithm Using Probabilistic Neural Networks for Learning Control
    Li, Yang
    Guo, Shijie
    Zhu, Lishuang
    Mukai, Toshiharu
    Gan, Zhongxue
    IEEE ACCESS, 2019, 7 (184457-184467) : 184457 - 184467
  • [45] Discrete Samplers for Approximate Inference in Probabilistic Machine Learning
    Zhao, Shirui
    Shah, Nimish
    Meert, Wannes
    Verhelst, Marian
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1221 - 1226
  • [46] Inference and Learning with Model Uncertainty in Probabilistic Logic Programs
    Verreet, Victor
    Derkinderen, Vincent
    Dos Martires, Pedro Zuidberg
    De Raedt, Luc
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2022, 364
  • [47] A comparison of algorithms for inference and learning in probabilistic graphical models
    Frey, BJ
    Jojic, N
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (09) : 1392 - 1416
  • [48] Probabilistic modelling, inference and learning using logical theories
    K. S. Ng
    J. W. Lloyd
    W. T. B. Uther
    Annals of Mathematics and Artificial Intelligence, 2008, 54 : 159 - 205
  • [49] LEARNING AND HYPOTHESIS-TESTING IN PROBABILISTIC INFERENCE TASKS
    BREHMER, B
    ALM, H
    WARG, LE
    SCANDINAVIAN JOURNAL OF PSYCHOLOGY, 1985, 26 (04) : 305 - 313
  • [50] Inference and Learning with Model Uncertainty in Probabilistic Logic Programs
    Verreet, Victor
    Derkinderen, Vincent
    Dos Martires, Pedro Zuidberg
    De Raedt, Luc
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10060 - 10069