Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

被引:0
|
作者
Liu, Yang [1 ]
Hofert, Marius [1 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
关键词
GO;
D O I
10.1109/ICRA57147.2024.10610966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.
引用
收藏
页码:2791 / 2797
页数:7
相关论文
共 50 条
  • [31] Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization
    Zhou, Qi
    Li, HouQiang
    Wang, Jie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6941 - 6948
  • [32] On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
    Zhang, Baohe
    Rajan, Raghu
    Pineda, Luis
    Lambert, Nathan
    Biedenkapp, Andre
    Chua, Kurtland
    Hutter, Frank
    Calandra, Roberto
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [33] Uncertainty-Aware Model-Based Offline Reinforcement Learning for Automated Driving
    Diehl, Christopher
    Sievernich, Timo Sebastian
    Kruger, Martin
    Hoffmann, Frank
    Bertram, Torsten
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1167 - 1174
  • [34] RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
    Rigter, Marc
    Lacerda, Bruno
    Hawes, Nick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [35] On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples
    Karabag, Mustafa O.
    Topcu, Ufuk
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8195 - 8202
  • [36] Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
    Guo, Kaiyang
    Shao, Yunfeng
    Geng, Yanhui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
    Yan, Yuling
    Li, Gen
    Chen, Yuxin
    Fan, Jianqing
    OPERATIONS RESEARCH, 2024, 72 (06) : 2430 - 2445
  • [38] Bidirectional Learning for Offline Infinite-width Model-based Optimization
    Chen, Can
    Zhang, Yingxue
    Fu, Jie
    Liu, Xue
    Coates, Mark
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
    Jayant, Ashish Kumar
    Bhatnagar, Shalabh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [40] Model-Based Reinforcement Learning for Quantized Federated Learning Performance Optimization
    Yang, Nuocheng
    Wang, Sihua
    Chen, Mingzhe
    Brinton, Christopher G.
    Yin, Changchuan
    Saad, Walid
    Cui, Shuguang
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 5063 - 5068