Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning

被引:0
|
作者
Liu, Yang [1 ]
Hofert, Marius [1 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
关键词
GO;
D O I
10.1109/ICRA57147.2024.10610966
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Offline reinforcement learning (RL) aims to optimize a policy, based on pre-collected data, to maximize the cumulative rewards after performing a sequence of actions. Existing approaches learn a value function from historical data and then guide the updating of the policy parameters by maximizing the value function at a single time. Driven by the gap between maximizing the cumulative rewards of RL and the greedy strategy of existing methods, we propose an approach of policy optimization by looking ahead (POLA) to mitigate the gap. Concretely, we optimize the policy on both current and future states where the future states are predicted by a transition model. A trajectory contains numerous actions before the task is done. Performing the best action at each time does not mean an optimal trajectory in the end. We need to allow sub-optimal or negative actions occasionally. But existing methods focus on generating the optimal action at each time according to the maximizing Q-value principle. This motivates our looking ahead approach. Besides, hidden confounding factors may affect the decision making process. To that end, we incorporate the correlations among dimensions of the state into the policy, providing more information about the environment for the policy to make decisions. Empirical results on the Mujoco dataset show the effectiveness of the proposed approach.
引用
收藏
页码:2791 / 2797
页数:7
相关论文
共 50 条
  • [1] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [2] Model-Based Offline Reinforcement Learning with Uncertainty Estimation and Policy Constraint
    Zhu J.
    Du C.
    Dullerud G.E.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 1 - 13
  • [3] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
    Yang, Shentao
    Feng, Yihao
    Zhang, Shujian
    Zhou, Mingyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [7] Offline Model-based Adaptable Policy Learning
    Chen, Xiong-Hui
    Yu, Yang
    Li, Qingyang
    Luo, Fan-Ming
    Qin, Zhiwei
    Shang, Wenjie
    Ye, Jieping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [10] Model-Based Offline Reinforcement Learning with Local Misspecification
    Dong, Kefan
    Flet-Berliac, Yannis
    Nie, Allen
    Brunskill, Emma
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431