Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

被引:0
|
作者
Zhang, Weitong [1 ]
Zhou, Dongruo [1 ]
Gu, Quanquan [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an epsilon-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most (O) over tilde (H(5)d(2)epsilon(-2)) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most (O) over tilde (H(4)d(H + d)epsilon(-2)) to achieve an epsilon-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least (Omega) over tilde (H(2)d epsilon(-2)) episodes to obtain an epsilon-optimal policy. Our upper bound matches the lower bound in terms of the dependence on epsilon and the dependence on d if H >= d.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] On Reward-Free Reinforcement Learning with Linear Function Approximation
    Wang, Ruosong
    Du, Simon S.
    Yang, Lin F.
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Reward-Free Exploration for Reinforcement Learning
    Jin, Chi
    Krishnamurthy, Akshay
    Simchowitz, Max
    Yu, Tiancheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] Nearly Optimal Reward-Free Reinforcement Learning
    Zhang, Zihan
    Du, Simon S.
    Ji, Xiangyang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] A Simple Reward-free Approach to Constrained Reinforcement Learning
    Miryoosefi, Sobhan
    Jin, Chi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Reward-Free Policy Space Compression for Reinforcement Learning
    Mutti, Mirco
    Del Col, Stefano
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [6] Robust Reward-Free ActorCritic for Cooperative Multiagent Reinforcement Learning
    Lin, Qifeng
    Ling, Qing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17318 - 17329
  • [7] Reward-Free Reinforcement Learning Algorithm Using Prediction Network
    Yu, Zhen
    Feng, Yimin
    Liu, Lijun
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 663 - 670
  • [8] Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
    Yin, Ming
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
    Hwang, Taehyun
    Oh, Min-Hwan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7971 - 7979
  • [10] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
    Shen, Shitian
    Lin, Chen
    Mostafavi, Behrooz
    Barnes, Tiffany
    Chi, Min
    INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505