Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

被引:0
|
作者
Huang, Zhenbo [1 ]
Sun, Shiliang [2 ]
Zhao, Jing [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
关键词
Offline reinforcement learning; Reward-free learning; Action exploration;
D O I
10.1016/j.knosys.2024.112018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding costly or risky interactions with the environment. In the offline setting, the inherent problem of distribution shift leads to extrapolation error, resulting in policy learning failures. Conventional offline RL methods tackle this by reducing the value estimates of unseen actions or incorporating policy constraints. However, these methods confine the agent's actions within the data manifold, hampering the agent's capacity to acquire fresh insights from actions beyond the dataset's scope. To address this, we propose a novel offline RL method incorporating action exploration, called EoRL. We partition policy learning into behavior and exploration learning, where exploration learning empowers the agent to discover novel actions, while behavior learning approximates the behavior policy. Specifically, in exploratory learning, we define the deviation between decision actions and dataset actions as the action novelty, replacing the traditional reward with an assessment of the cumulative novelty of the policy. Additionally, behavior policy restricts actions to the vicinity of the dataset-supported actions, and the two parts of the policy learning share parameters. We demonstrate EoRL's ability to explore a larger action space while controlling the policy shift. And its reward-free learning model is more compatible with realistic task scenarios. Experimental results demonstrate the outstanding performance of our method on Mujoco locomotion and 2D maze tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Boosting Policy Learning in Reinforcement Learning via Adaptive Intrinsic Reward Regulation
    Zhao, Qian
    Han, Jinhui
    Xu, Mao
    IEEE Access, 2024, 12 : 2224 - 2235
  • [22] Learning Behavior of Offline Reinforcement Learning Agents
    Shukla, Indu
    Dozier, Haley. R.
    Henslee, Althea. C.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [23] Optimizing Policy via Deep Reinforcement Learning for Dialogue Management
    Xu, Guanghao
    Lee, Hyunjung
    Koo, Myoung-Wan
    Seo, Jungyun
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 582 - 589
  • [24] QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning
    Pang, Teng
    Wu, Guoqiang
    Zhang, Yan
    Wang, Bingzheng
    Yin, Yilong
    PATTERN RECOGNITION, 2025, 158
  • [25] Optimizing trajectories for highway driving with offline reinforcement learning
    Mirchevska, Branka
    Werling, Moritz
    Boedecker, Joschka
    FRONTIERS IN FUTURE TRANSPORTATION, 2023, 4
  • [26] Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
    Zhang, Ruiqi
    Zanette, Andrea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions
    Wang, Tao
    Xie, Shaorong
    Gao, Mingke
    Chen, Xue
    Zhang, Zhenyu
    Yu, Hang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1167 - 1174
  • [28] Supported Policy Optimization for Offline Reinforcement Learning
    Wu, Jialong
    Wu, Haixu
    Qiu, Zihan
    Wang, Jianmin
    Long, Mingsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [29] Implicit policy constraint for offline reinforcement learning
    Peng, Zhiyong
    Liu, Yadong
    Han, Changlin
    Zhou, Zongtan
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (04) : 973 - 981
  • [30] Weighted Policy Constraints for Offline Reinforcement Learning
    Peng, Zhiyong
    Han, Changlin
    Liu, Yadong
    Zhou, Zongtan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9435 - 9443