Model-based Policy Optimization with Unsupervised Model Adaptation

被引:0
|
作者
Shen, Jian [1 ]
Zhao, Han [2 ,3 ]
Zhang, Weinan [1 ]
Yu, Yong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] D E Shaw & Co, New York, NY USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] MODEL-BASED EVOLUTIONARY OPTIMIZATION
    Wang, Yongqiang
    Fu, Michael C.
    Marcus, Steven I.
    PROCEEDINGS OF THE 2010 WINTER SIMULATION CONFERENCE, 2010, : 1199 - 1210
  • [32] Model-Based Optimization for Robotics
    Mombaur, Katja
    Kheddar, Abderrahmane
    Harada, Kensuke
    Buschmann, Thomas
    Atkeson, Chris
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2014, 21 (03) : 24 - 161
  • [33] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [34] Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning
    Liu, Yang
    Hofert, Marius
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2791 - 2797
  • [35] Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
    Zhang, Shenao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Unsupervised noise model estimation for model-based robust speech recognition
    Graciarena, M
    Franco, H
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 351 - 356
  • [37] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [38] Model-based transportation policy analysis
    Mirchandani, Pitu B.
    Head, K. Larry
    Boyce, David
    International Journal of Technology Management, 2000, 19 (03) : 507 - 531
  • [39] Model-based policy analysis - Introduction
    Bunn, DW
    Larsen, ER
    Vlahos, K
    ENERGY POLICY, 1997, 25 (03) : 271 - 272
  • [40] Model-based transportation policy analysis
    Mirchandani, PB
    Head, KL
    Boyce, D
    INTERNATIONAL JOURNAL OF TECHNOLOGY MANAGEMENT, 2000, 19 (3-5) : 507 - 531