Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

被引：0

作者：

Huang, Zhenbo ^{[1
]}

Sun, Shiliang ^{[2
]}

Zhao, Jing ^{[1
]}

机构：

[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Offline reinforcement learning; Reward-free learning; Action exploration;

D O I：

10.1016/j.knosys.2024.112018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding costly or risky interactions with the environment. In the offline setting, the inherent problem of distribution shift leads to extrapolation error, resulting in policy learning failures. Conventional offline RL methods tackle this by reducing the value estimates of unseen actions or incorporating policy constraints. However, these methods confine the agent's actions within the data manifold, hampering the agent's capacity to acquire fresh insights from actions beyond the dataset's scope. To address this, we propose a novel offline RL method incorporating action exploration, called EoRL. We partition policy learning into behavior and exploration learning, where exploration learning empowers the agent to discover novel actions, while behavior learning approximates the behavior policy. Specifically, in exploratory learning, we define the deviation between decision actions and dataset actions as the action novelty, replacing the traditional reward with an assessment of the cumulative novelty of the policy. Additionally, behavior policy restricts actions to the vicinity of the dataset-supported actions, and the two parts of the policy learning share parameters. We demonstrate EoRL's ability to explore a larger action space while controlling the policy shift. And its reward-free learning model is more compatible with realistic task scenarios. Experimental results demonstrate the outstanding performance of our method on Mujoco locomotion and 2D maze tasks.

引用

页数：13

共 50 条

[1] Reward-Free Exploration for Reinforcement Learning
Jin, Chi
Krishnamurthy, Akshay
Simchowitz, Max
Yu, Tiancheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[2] Reward-Free Policy Space Compression for Reinforcement Learning
Mutti, Mirco
Del Col, Stefano
Restelli, Marcello
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[3] One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning
Cisneros-Velarde, Pedro
Lyu, Boxiang
Koyejo, Sanmi
Kolar, Mladen
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[4] Nearly Optimal Reward-Free Reinforcement Learning
Zhang, Zihan
Du, Simon S.
Ji, Xiangyang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] A Simple Reward-free Approach to Constrained Reinforcement Learning
Miryoosefi, Sobhan
Jin, Chi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] On Reward-Free Reinforcement Learning with Linear Function Approximation
Wang, Ruosong
Du, Simon S.
Yang, Lin F.
Salakhutdinov, Ruslan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] Adaptive Reward-Free Exploration
Kaufmann, Emilie
Menard, Pierre
Domingues, Omar Darwiche
Jonsson, Anders
Leurent, Edouard
Valko, Michal
ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[8] Robust Reward-Free ActorCritic for Cooperative Multiagent Reinforcement Learning
Lin, Qifeng
Ling, Qing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17318 - 17329
[9] Reward-Free Reinforcement Learning Algorithm Using Prediction Network
Yu, Zhen
Feng, Yimin
Liu, Lijun
FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 663 - 670
[10] Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
Zhang, Weitong
Zhou, Dongruo
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →