Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

被引:0
|
作者
Zhang, Weitong [1 ]
Zhou, Dongruo [1 ]
Gu, Quanquan [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an epsilon-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most (O) over tilde (H(5)d(2)epsilon(-2)) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most (O) over tilde (H(4)d(H + d)epsilon(-2)) to achieve an epsilon-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least (Omega) over tilde (H(2)d epsilon(-2)) episodes to obtain an epsilon-optimal policy. Our upper bound matches the lower bound in terms of the dependence on epsilon and the dependence on d if H >= d.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] EEG-based classification of learning strategies : model-based and model-free reinforcement learning
    Kim, Dongjae
    Weston, Charles
    Lee, Sang Wan
    2018 6TH INTERNATIONAL CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI), 2018, : 146 - 148
  • [42] A Modified Average Reward Reinforcement Learning Based on Fuzzy Reward Function
    Zhai, Zhenkun
    Chen, Wei
    Li, Xiong
    Guo, Jing
    IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 113 - 117
  • [43] Model-based Reinforcement Learning: A Survey
    Moerland, Thomas M.
    Broekens, Joost
    Plaat, Aske
    Jonker, Catholijn M.
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
  • [44] A survey on model-based reinforcement learning
    Fan-Ming LUO
    Tian XU
    Hang LAI
    Xiong-Hui CHEN
    Weinan ZHANG
    Yang YU
    Science China(Information Sciences), 2024, 67 (02) : 59 - 84
  • [45] Nonparametric model-based reinforcement learning
    Atkeson, CG
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
  • [46] The ubiquity of model-based reinforcement learning
    Doll, Bradley B.
    Simon, Dylan A.
    Daw, Nathaniel D.
    CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
  • [47] Multiple model-based reinforcement learning
    Doya, K
    Samejima, K
    Katagiri, K
    Kawato, M
    NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
  • [48] A survey on model-based reinforcement learning
    Luo, Fan-Ming
    Xu, Tian
    Lai, Hang
    Chen, Xiong-Hui
    Zhang, Weinan
    Yu, Yang
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
  • [49] Value-Aware Loss Function for Model-based Reinforcement Learning
    Farahmand, Amir-massoud
    Barreto, Andre M. S.
    Nikovski, Daniel N.
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1486 - 1494
  • [50] Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes
    Wagenmaker, Andrew
    Chen, Yifang
    Simchowitz, Max
    Du, Simon S.
    Jamieson, Kevin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,