Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration

被引:0
|
作者
Huang, Zhenbo [1 ]
Sun, Shiliang [2 ]
Zhao, Jing [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
关键词
Offline reinforcement learning; Reward-free learning; Action exploration;
D O I
10.1016/j.knosys.2024.112018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to learn a policy from pre-collected data, avoiding costly or risky interactions with the environment. In the offline setting, the inherent problem of distribution shift leads to extrapolation error, resulting in policy learning failures. Conventional offline RL methods tackle this by reducing the value estimates of unseen actions or incorporating policy constraints. However, these methods confine the agent's actions within the data manifold, hampering the agent's capacity to acquire fresh insights from actions beyond the dataset's scope. To address this, we propose a novel offline RL method incorporating action exploration, called EoRL. We partition policy learning into behavior and exploration learning, where exploration learning empowers the agent to discover novel actions, while behavior learning approximates the behavior policy. Specifically, in exploratory learning, we define the deviation between decision actions and dataset actions as the action novelty, replacing the traditional reward with an assessment of the cumulative novelty of the policy. Additionally, behavior policy restricts actions to the vicinity of the dataset-supported actions, and the two parts of the policy learning share parameters. We demonstrate EoRL's ability to explore a larger action space while controlling the policy shift. And its reward-free learning model is more compatible with realistic task scenarios. Experimental results demonstrate the outstanding performance of our method on Mujoco locomotion and 2D maze tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Reward Certification for Policy Smoothed Reinforcement Learning
    Mu, Ronghui
    Marcolino, Leandro Soriano
    Zhang, Yanghao
    Zhang, Tianle
    Huang, Xiaowei
    Ruan, Wenjie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
  • [42] Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization
    Kong, Rui
    Wu, Chenyang
    Gao, Chen-Xiao
    Zhang, Zongzhang
    Li, Ming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4317 - 4325
  • [43] Reward Space Noise for Exploration in Deep Reinforcement Learning
    Sun, Chuxiong
    Wang, Rui
    Li, Qian
    Hu, Xiaohui
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
  • [44] Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward
    Xu, Tengyu
    Wang, Yue
    Zou, Shaofeng
    Liang, Yingbin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (09) : 6481 - 6518
  • [45] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
    Yue, Sheng
    Deng, Yongheng
    Wang, Guanbo
    Ren, Ju
    Zhang, Yaoxue
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (06) : 1360 - 1372
  • [46] Federated Offline Reinforcement Learning with Proximal Policy Evaluation
    Sheng YUE
    Yongheng DENG
    Guanbo WANG
    Ju REN
    Yaoxue ZHANG
    Chinese Journal of Electronics, 2024, 33 (06) : 1360 - 1372
  • [47] Diversification of Adaptive Policy for Effective Offline Reinforcement Learning
    Choi, Yunseon
    Zhao, Li
    Zhang, Chuheng
    Song, Lei
    Bian, Jiang
    Kim, Kee-Eung
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 3863 - 3871
  • [48] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
    Wu, Lan
    Liu, Quan
    Zhang, Lihua
    Huang, Zhigang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014
  • [49] UAC: Offline Reinforcement Learning With Uncertain Action Constraint
    Guan, Jiayi
    Gu, Shangding
    Li, Zhijun
    Hou, Jing
    Yang, Yiqin
    Chen, Guang
    Jiang, Changjun
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (02) : 671 - 680
  • [50] Offline Reinforcement Learning With Reverse Diffusion Guide Policy
    Zhang, Jiazhi
    Cheng, Yuhu
    Cao, Shuo
    Wang, Xuesong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (10) : 11785 - 11793