Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

被引：0

作者：

Zhang, Weitong ^{[1
]}

Zhou, Dongruo ^{[1
]}

Gu, Quanquan ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an epsilon-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most (O) over tilde (H(5)d(2)epsilon(-2)) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most (O) over tilde (H(4)d(H + d)epsilon(-2)) to achieve an epsilon-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least (Omega) over tilde (H(2)d epsilon(-2)) episodes to obtain an epsilon-optimal policy. Our upper bound matches the lower bound in terms of the dependence on epsilon and the dependence on d if H >= d.

引用

页数：12

共 50 条

[1] On Reward-Free Reinforcement Learning with Linear Function Approximation
Wang, Ruosong
Du, Simon S.
Yang, Lin F.
Salakhutdinov, Ruslan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Reward-Free Exploration for Reinforcement Learning
Jin, Chi
Krishnamurthy, Akshay
Simchowitz, Max
Yu, Tiancheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[3] Nearly Optimal Reward-Free Reinforcement Learning
Zhang, Zihan
Du, Simon S.
Ji, Xiangyang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[4] A Simple Reward-free Approach to Constrained Reinforcement Learning
Miryoosefi, Sobhan
Jin, Chi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Reward-Free Policy Space Compression for Reinforcement Learning
Mutti, Mirco
Del Col, Stefano
Restelli, Marcello
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[6] Robust Reward-Free ActorCritic for Cooperative Multiagent Reinforcement Learning
Lin, Qifeng
Ling, Qing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17318 - 17329
[7] Reward-Free Reinforcement Learning Algorithm Using Prediction Network
Yu, Zhen
Feng, Yimin
Liu, Lijun
FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 663 - 670
[8] Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Yin, Ming
Wang, Yu-Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
Hwang, Taehyun
Oh, Min-Hwan
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7971 - 7979
[10] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
Shen, Shitian
Lin, Chen
Mostafavi, Behrooz
Barnes, Tiffany
Chi, Min
INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505

← 1 2 3 4 5 →