Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

被引:0
|
作者
Liu, Yang [1 ]
Hofert, Marius [1 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
关键词
GO;
D O I
10.1109/ICRA57147.2024.10610966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.
引用
收藏
页码:2791 / 2797
页数:7
相关论文
共 50 条
  • [21] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [22] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
    Shen, Jian
    Chen, Mingcheng
    Zhang, Zhicheng
    Yang, Zhengyu
    Zhang, Weinan
    Yu, Yong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
  • [23] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
  • [24] ROMA: Reverse Model-Based Data Augmentation for Offline Reinforcement Learning
    Wei, Xiaochen
    Huang, Wenzhen
    Zhai, Ziming
    BIG DATA AND SECURITY, ICBDS 2023, PT I, 2024, 2099 : 178 - 193
  • [25] Offline model-based reinforcement learning with causal structured world models
    Zhu, Zhengmao
    Tian, Honglong
    Chen, Xionghui
    Zhang, Kun
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (04)
  • [26] MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
    Liu, Xiao-Yin
    Zhou, Xiao-Hu
    Li, Guotao
    Li, Hao
    Gui, Mei-Jiang
    Xiang, Tian-Yu
    Huang, De-Xing
    Hou, Zeng-Guang
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4587 - 4595
  • [27] Model-Based Reinforcement Learning with Multi-task Offline Pretraining
    Pan, Minting
    Zheng, Yitao
    Wang, Yunbo
    Yang, Xiaokang
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT VII, ECML PKDD 2024, 2024, 14947 : 22 - 39
  • [28] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [29] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    Frontiers of Computer Science, 2024, 18
  • [30] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)