Model-based Policy Optimization with Unsupervised Model Adaptation

被引:0
|
作者
Shen, Jian [1 ]
Zhao, Han [2 ,3 ]
Zhang, Weinan [1 ]
Yu, Yong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] D E Shaw & Co, New York, NY USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Adaptation Augmented Model-based Policy Optimization
    Shen, Jian
    Lai, Hang
    Liu, Minghuan
    Zhao, Han
    Yu, Yong
    Zhang, Weinan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [2] Policy Optimization with Model-Based Explorations
    Pan, Feiyang
    Cai, Qingpeng
    Zeng, An-Xiang
    Pan, Chun-Xiang
    Da, Qing
    He, Hualin
    He, Qing
    Tang, Pingzhong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4675 - 4682
  • [3] Bidirectional Model-based Policy Optimization
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Yu, Yong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Variational Model-based Policy Optimization
    Chow, Yinlam
    Cui, Brandon
    Ryu, MoonKyung
    Ghavamzadeh, Mohammad
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2292 - 2299
  • [5] RoMA: Robust Model Adaptation for Offline Model-based Optimization
    Yu, Sihyun
    Ahn, Sungsoo
    Song, Le
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] When to Trust Your Model: Model-Based Policy Optimization
    Janner, Michael
    Fu, Justin
    Zhang, Marvin
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [8] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Speaker verification with model-based and score-based unsupervised adaptation method
    Wang, Er-Yu
    Guo, Wu
    Li, Yi-Jie
    Dai, Li-Rong
    Wang, Ren-Hua
    Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (03): : 267 - 271
  • [10] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)