Model-based Policy Optimization with Unsupervised Model Adaptation

被引:0
|
作者
Shen, Jian [1 ]
Zhao, Han [2 ,3 ]
Zhang, Weinan [1 ]
Yu, Yong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] D E Shaw & Co, New York, NY USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] MRF model and FRAME model-based unsupervised image segmentation
    Bing Cheng
    Ying Wang
    Nanning Zheng
    Xinchun Jia
    Zhengzhong Bian
    Science in China Series F: Information Sciences, 2004, 47 : 697 - 705
  • [22] MRF model and FRAME model-based unsupervised image segmentation
    Cheng, B
    Wang, Y
    Zheng, NN
    Jia, XC
    Bian, ZZ
    SCIENCE IN CHINA SERIES F-INFORMATION SCIENCES, 2004, 47 (06): : 697 - 705
  • [23] Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective
    Li, Yansong
    Han, Shuo
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [24] Unsupervised fuzzy model-based Gaussian clustering
    Yang, Miin-Shen
    Chang-Chien, Shou-Jen
    Nataliani, Yessica
    INFORMATION SCIENCES, 2019, 481 : 1 - 23
  • [25] Unsupervised fuzzy model-based image segmentation
    Choy, Siu Kai
    Ng, Tsz Ching
    Yu, Carisa
    SIGNAL PROCESSING, 2020, 171
  • [26] Model-Based Adaptation for Robotics Software
    Aldrich, Jonathan
    Garlan, David
    Kaestner, Christian
    Le Goues, Claire
    Mohseni-Kabir, Anahita
    Ruchkin, Ivan
    Samuel, Selva
    Schmerl, Bradley
    Timperley, Christopher
    Veloso, Manuela
    Voysey, Ian
    Biswas, Joydeep
    Guha, Arjun
    Holtz, Jarrett
    Camara, Javier
    Jamshidi, Pooyan
    IEEE SOFTWARE, 2019, 36 (02) : 83 - 90
  • [27] Model-based user interface adaptation
    Nilsson, Erik G.
    Floch, Jacqueline
    Hallsteinsen, Svein
    Stav, Erlend
    COMPUTERS & GRAPHICS-UK, 2006, 30 (05): : 692 - 701
  • [28] Model Inversion Networks for Model-Based Optimization
    Kumar, Aviral
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [29] Population model-based optimization
    Chen, Xi
    Zhou, Enlu
    JOURNAL OF GLOBAL OPTIMIZATION, 2015, 63 (01) : 125 - 148
  • [30] Population model-based optimization
    Xi Chen
    Enlu Zhou
    Journal of Global Optimization, 2015, 63 : 125 - 148