Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

被引：0

作者：

Zhang, Weitong ^{[1
]}

Zhou, Dongruo ^{[1
]}

Gu, Quanquan ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an epsilon-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most (O) over tilde (H(5)d(2)epsilon(-2)) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most (O) over tilde (H(4)d(H + d)epsilon(-2)) to achieve an epsilon-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least (Omega) over tilde (H(2)d epsilon(-2)) episodes to obtain an epsilon-optimal policy. Our upper bound matches the lower bound in terms of the dependence on epsilon and the dependence on d if H >= d.

引用

页数：12

共 50 条

[31] Provably Efficient Reinforcement Learning with Linear Function Approximation
Jin, Chi
Yang, Zhuoran
Wang, Zhaoran
Jordan, Michael, I
MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
[32] Exponential Hardness of Reinforcement Learning with Linear Function Approximation
Kane, Daniel
Liu, Sihan
Lovett, Shachar
Mahajan, Gaurav
Szepesvári, Csaba
Weisz, Gellért
Proceedings of Machine Learning Research, 2023, 195 : 1588 - 1617
[33] Differentially Private Reinforcement Learning with Linear Function Approximation
Zhou, Xingyu
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
[34] Model-based Reinforcement Learning Approach for Deformable Linear Object Manipulation
Han, Haifeng
Paul, Gavin
Matsubara, Takamitsu
2017 13TH IEEE CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2017, : 750 - 755
[35] Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning
Langaa, Jeppe
Sloth, Christoffer
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[36] Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics
Massi, Elisa
Barthelemy, Jeanne
Mailly, Juliane
Dromnelle, Remi
Canitrot, Julien
Poniatowski, Esther
Girard, Benoit
Khamassi, Mehdi
FRONTIERS IN NEUROROBOTICS, 2022, 16
[37] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Swazinna, Phillip
Udluft, Steffen
Hein, Daniel
Runkler, Thomas
IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
[38] Hybrid control for combining model-based and model-free reinforcement learning
Pinosky, Allison
Abraham, Ian
Broad, Alexander
Argall, Brenna
Murphey, Todd D.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
[39] Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation
Peter Dayan
Kent C. Berridge
Cognitive, Affective, & Behavioral Neuroscience, 2014, 14 : 473 - 492
[40] Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation
Dayan, Peter
Berridge, Kent C.
COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE, 2014, 14 (02) : 473 - 492

← 1 2 3 4 5 →