Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

被引:0
|
作者
Liu, Yang [1 ]
Hofert, Marius [1 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
关键词
GO;
D O I
10.1109/ICRA57147.2024.10610966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.
引用
收藏
页码:2791 / 2797
页数:7
相关论文
共 50 条
  • [41] Offline Model-Based Optimization via Policy-Guided Gradient Search
    Chemingui, Yassine
    Deshwal, Aryan
    Hoang, Trong Nghia
    Doppa, Janardhan Rao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11230 - 11239
  • [42] Efficient hyperparameter optimization through model-based reinforcement learning
    Wu, Jia
    Chen, SenPeng
    Liu, XiYuan
    NEUROCOMPUTING, 2020, 409 : 381 - 393
  • [43] Model-Based Reinforcement Learning Method for Microgrid Optimization Scheduling
    Yao, Jinke
    Xu, Jiachen
    Zhang, Ning
    Guan, Yajuan
    SUSTAINABILITY, 2023, 15 (12)
  • [44] Deep Reinforcement Learning with Model-based Acceleration for Hyperparameter Optimization
    Chen, SenPeng
    Wu, Jia
    Chen, XiuYun
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 170 - 177
  • [45] Model-Based Meta-reinforcement Learning for Hyperparameter Optimization
    Albrechts, Jeroen
    Martin, Hugo M.
    Tavakol, Maryam
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 27 - 39
  • [46] Model-based offline reinforcement learning framework for optimizing tunnel boring machine operation
    Cao, Yupeng
    Luo, Wei
    Xue, Yadong
    Lin, Weiren
    Zhang, Feng
    UNDERGROUND SPACE, 2024, 19 : 47 - 71
  • [47] Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning
    Lutter, Michael
    Silberbauer, Johannes
    Watson, Joe
    Peters, Jan
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4163 - 4170
  • [48] Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
    Zhang, Jing
    Zhang, Chi
    Wang, Wenjia
    Jing, Bing-Yi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
    Hishinuma, Toru
    Senda, Kei
    IEEE ACCESS, 2023, 11 : 145579 - 145590
  • [50] Energy-Based Policy Constraint for Offline Reinforcement Learning
    Peng, Zhiyong
    Han, Changlin
    Liu, Yadong
    Zhou, Zongtan
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 335 - 346