Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

被引：0

作者：

Huang, Zhenbo ^{[1
]}

Sun, Shiliang ^{[2
]}

Zhao, Jing ^{[1
]}

机构：

[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

关键词：

Offline reinforcement learning; Reward-free learning; Action exploration;

D O I：

10.1016/j.knosys.2024.112018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding costly or risky interactions with the environment. In the offline setting, the inherent problem of distribution shift leads to extrapolation error, resulting in policy learning failures. Conventional offline RL methods tackle this by reducing the value estimates of unseen actions or incorporating policy constraints. However, these methods confine the agent's actions within the data manifold, hampering the agent's capacity to acquire fresh insights from actions beyond the dataset's scope. To address this, we propose a novel offline RL method incorporating action exploration, called EoRL. We partition policy learning into behavior and exploration learning, where exploration learning empowers the agent to discover novel actions, while behavior learning approximates the behavior policy. Specifically, in exploratory learning, we define the deviation between decision actions and dataset actions as the action novelty, replacing the traditional reward with an assessment of the cumulative novelty of the policy. Additionally, behavior policy restricts actions to the vicinity of the dataset-supported actions, and the two parts of the policy learning share parameters. We demonstrate EoRL's ability to explore a larger action space while controlling the policy shift. And its reward-free learning model is more compatible with realistic task scenarios. Experimental results demonstrate the outstanding performance of our method on Mujoco locomotion and 2D maze tasks.

引用

页数：13

共 50 条

[31] Offline Reinforcement Learning with Failure Under Sparse Reward Environments
Wu, Mingkang
Siddique, Umer
Sinha, Abhinav
Cao, Yongcan
2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
[32] Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
Gieselmann, Robert
Pokorny, Florian T.
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[33] DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation
Jayanthi, Sravan
Chen, Letian
Balabanska, Nadya
Duong, Van
Scarlatescu, Erik
Ameperosa, Ezra
Zaidi, Zulfiqar
Martin, Daniel
Del Matto, Taylor
Ono, Masahiro
Gombolay, Matthew
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[34] Reward-Relevance-Filtered Linear Offline Reinforcement Learning
Zhou, Angela
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[35] ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning
Han, Xinchen
Afifi, Hossam
Marot, Michel
NEUROCOMPUTING, 2025, 631
[36] Learning to Influence Human Behavior with Offline Reinforcement Learning
Hong, Joey
Levine, Sergey
Dragan, Anca
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[37] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
Yang, Yulong
Cao, Weihua
Guo, Linwei
Gan, Chao
Wu, Min
2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
[38] Optimizing Enhanced Cost Per Click via Reinforcement Learning Without Exploration
Li, Sinan
Yuan, Chun
Zhu, Xin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[39] Offline Reinforcement Learning With Behavior Value Regularization
Huang, Longyang
Dong, Botao
Xie, Wei
Zhang, Weidong
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3692 - 3704
[40] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
Zheng, Han
Luo, Xufang
Wei, Pengfei
Song, Xuan
Li, Dongsheng
Jiang, Jing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380

← 1 2 3 4 5 →