Probabilistic Inference in Reinforcement Learning Done Right

被引：0

作者：

Tarbouriech, Jean ^{[1
]}

Lattimore, Tor ^{[1
]}

O'Donoghue, Brendan ^{[1
]}

机构：

[1] Google DeepMind, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

ENTROPY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.

引用

页数：39

共 50 条

[31] Reinforcement and Weakening of Stroop Effect in Probabilistic Learning
Utochkin, I. S.
Bolshakova, K. G.
PSYCHOLOGY-JOURNAL OF THE HIGHER SCHOOL OF ECONOMICS, 2010, 7 (03): : 139 - +
[32] PROBABILISTIC REINFORCEMENT LEARNING IN INDIVIDUALS AT RISK FOR PSYCHOSIS
Murray, Graham Keith
Mukkala, S.
Barnett, J.
Jaaskelainen, E.
Maki, P.
Moilanen, I.
Miettunen, J.
Jones, P. B.
Veijola, J.
SCHIZOPHRENIA BULLETIN, 2011, 37 : 222 - 222
[33] PROBABILISTIC DISCRIMINATION LEARNING OF A SEQUENTIAL REINFORCEMENT PATTERN
BUGGIE, SE
PSYCHONOMIC SCIENCE, 1969, 15 (06): : 309 - &
[34] Probabilistic reinforcement learning and sleep: A pilot study
Breslin, J. H.
Frank, M. J.
Bootzin, R. R.
Finley, S. R.
Nadel, L.
SLEEP, 2008, 31 : A378 - A378
[35] Impairments in probabilistic learning in schizophrenia: The role of reinforcement
Waltz, JA
Frank, MJ
Robinson, B
Gold, JM
BIOLOGICAL PSYCHIATRY, 2006, 59 (08) : 113S - 113S
[36] Probabilistic Policy Reuse for Safe Reinforcement Learning
Garcia, Javier
Fernandez, Fernando
ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2019, 13 (03)
[37] Verified Probabilistic Policies for Deep Reinforcement Learning
Bacci, Edoardo
Parker, David
NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212
[38] PROBABILISTIC REINFORCEMENT LEARNING IN SCHIZOPHRENIA: RELATIONSHIP TO AMOTIVATION
Dowd, Erin Connor
Barch, Deanna Marie
SCHIZOPHRENIA BULLETIN, 2011, 37 : 135 - 135
[39] Testing probabilistic equivalence through Reinforcement Learning
Desharnais, Josee
Laviolette, Francois
Zhioua, Sami
INFORMATION AND COMPUTATION, 2013, 227 : 21 - 57
[40] Importance-Weighted Offline Learning Done Right
Gabbianelli, Germano
Neu, Gergely
Papini, Matteo
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237

← 1 2 3 4 5 →