Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

被引:0
|
作者
Zhang, Weitong [1 ]
Zhou, Dongruo [1 ]
Gu, Quanquan [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an epsilon-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most (O) over tilde (H(5)d(2)epsilon(-2)) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most (O) over tilde (H(4)d(H + d)epsilon(-2)) to achieve an epsilon-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least (Omega) over tilde (H(2)d epsilon(-2)) episodes to obtain an epsilon-optimal policy. Our upper bound matches the lower bound in terms of the dependence on epsilon and the dependence on d if H >= d.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [32] Exponential Hardness of Reinforcement Learning with Linear Function Approximation
    Kane, Daniel
    Liu, Sihan
    Lovett, Shachar
    Mahajan, Gaurav
    Szepesvári, Csaba
    Weisz, Gellért
    Proceedings of Machine Learning Research, 2023, 195 : 1588 - 1617
  • [33] Differentially Private Reinforcement Learning with Linear Function Approximation
    Zhou, Xingyu
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
  • [34] Model-based Reinforcement Learning Approach for Deformable Linear Object Manipulation
    Han, Haifeng
    Paul, Gavin
    Matsubara, Takamitsu
    2017 13TH IEEE CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2017, : 750 - 755
  • [35] Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning
    Langaa, Jeppe
    Sloth, Christoffer
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [36] Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics
    Massi, Elisa
    Barthelemy, Jeanne
    Mailly, Juliane
    Dromnelle, Remi
    Canitrot, Julien
    Poniatowski, Esther
    Girard, Benoit
    Khamassi, Mehdi
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [37] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
  • [38] Hybrid control for combining model-based and model-free reinforcement learning
    Pinosky, Allison
    Abraham, Ian
    Broad, Alexander
    Argall, Brenna
    Murphey, Todd D.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
  • [39] Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation
    Peter Dayan
    Kent C. Berridge
    Cognitive, Affective, & Behavioral Neuroscience, 2014, 14 : 473 - 492
  • [40] Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation
    Dayan, Peter
    Berridge, Kent C.
    COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE, 2014, 14 (02) : 473 - 492