Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

被引:0
|
作者
Huang, Zhenbo [1 ]
Sun, Shiliang [2 ]
Zhao, Jing [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
关键词
Offline reinforcement learning; Reward-free learning; Action exploration;
D O I
10.1016/j.knosys.2024.112018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding costly or risky interactions with the environment. In the offline setting, the inherent problem of distribution shift leads to extrapolation error, resulting in policy learning failures. Conventional offline RL methods tackle this by reducing the value estimates of unseen actions or incorporating policy constraints. However, these methods confine the agent's actions within the data manifold, hampering the agent's capacity to acquire fresh insights from actions beyond the dataset's scope. To address this, we propose a novel offline RL method incorporating action exploration, called EoRL. We partition policy learning into behavior and exploration learning, where exploration learning empowers the agent to discover novel actions, while behavior learning approximates the behavior policy. Specifically, in exploratory learning, we define the deviation between decision actions and dataset actions as the action novelty, replacing the traditional reward with an assessment of the cumulative novelty of the policy. Additionally, behavior policy restricts actions to the vicinity of the dataset-supported actions, and the two parts of the policy learning share parameters. We demonstrate EoRL's ability to explore a larger action space while controlling the policy shift. And its reward-free learning model is more compatible with realistic task scenarios. Experimental results demonstrate the outstanding performance of our method on Mujoco locomotion and 2D maze tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Offline Reinforcement Learning with Failure Under Sparse Reward Environments
    Wu, Mingkang
    Siddique, Umer
    Sinha, Abhinav
    Cao, Yongcan
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [32] Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
    Gieselmann, Robert
    Pokorny, Florian T.
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [33] DROID: Learning from Offline Heterogeneous Demonstrations via Reward-Policy Distillation
    Jayanthi, Sravan
    Chen, Letian
    Balabanska, Nadya
    Duong, Van
    Scarlatescu, Erik
    Ameperosa, Ezra
    Zaidi, Zulfiqar
    Martin, Daniel
    Del Matto, Taylor
    Ono, Masahiro
    Gombolay, Matthew
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [34] Reward-Relevance-Filtered Linear Offline Reinforcement Learning
    Zhou, Angela
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [35] ELAPSE: Expand Latent Action Projection Space for policy optimization in Offline Reinforcement Learning
    Han, Xinchen
    Afifi, Hossam
    Marot, Michel
    NEUROCOMPUTING, 2025, 631
  • [36] Learning to Influence Human Behavior with Offline Reinforcement Learning
    Hong, Joey
    Levine, Sergey
    Dragan, Anca
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [38] Optimizing Enhanced Cost Per Click via Reinforcement Learning Without Exploration
    Li, Sinan
    Yuan, Chun
    Zhu, Xin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [39] Offline Reinforcement Learning With Behavior Value Regularization
    Huang, Longyang
    Dong, Botao
    Xie, Wei
    Zhang, Weidong
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3692 - 3704
  • [40] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380