Model-based Policy Optimization with Unsupervised Model Adaptation

被引：0

作者：

Shen, Jian ^{[1
]}

Zhao, Han ^{[2
,3
]}

Zhang, Weinan ^{[1
]}

Yu, Yong ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] D E Shaw & Co, New York, NY USA

[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.

引用

页数：12

共 50 条

[21] MRF model and FRAME model-based unsupervised image segmentation
Bing Cheng
Ying Wang
Nanning Zheng
Xinchun Jia
Zhengzhong Bian
Science in China Series F: Information Sciences, 2004, 47 : 697 - 705
[22] MRF model and FRAME model-based unsupervised image segmentation
Cheng, B
Wang, Y
Zheng, NN
Jia, XC
Bian, ZZ
SCIENCE IN CHINA SERIES F-INFORMATION SCIENCES, 2004, 47 (06): : 697 - 705
[23] Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective
Li, Yansong
Han, Shuo
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
[24] Unsupervised fuzzy model-based Gaussian clustering
Yang, Miin-Shen
Chang-Chien, Shou-Jen
Nataliani, Yessica
INFORMATION SCIENCES, 2019, 481 : 1 - 23
[25] Unsupervised fuzzy model-based image segmentation
Choy, Siu Kai
Ng, Tsz Ching
Yu, Carisa
SIGNAL PROCESSING, 2020, 171
[26] Model-Based Adaptation for Robotics Software
Aldrich, Jonathan
Garlan, David
Kaestner, Christian
Le Goues, Claire
Mohseni-Kabir, Anahita
Ruchkin, Ivan
Samuel, Selva
Schmerl, Bradley
Timperley, Christopher
Veloso, Manuela
Voysey, Ian
Biswas, Joydeep
Guha, Arjun
Holtz, Jarrett
Camara, Javier
Jamshidi, Pooyan
IEEE SOFTWARE, 2019, 36 (02) : 83 - 90
[27] Model-based user interface adaptation
Nilsson, Erik G.
Floch, Jacqueline
Hallsteinsen, Svein
Stav, Erlend
COMPUTERS & GRAPHICS-UK, 2006, 30 (05): : 692 - 701
[28] Model Inversion Networks for Model-Based Optimization
Kumar, Aviral
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[29] Population model-based optimization
Chen, Xi
Zhou, Enlu
JOURNAL OF GLOBAL OPTIMIZATION, 2015, 63 (01) : 125 - 148
[30] Population model-based optimization
Xi Chen
Enlu Zhou
Journal of Global Optimization, 2015, 63 : 125 - 148

← 1 2 3 4 5 →