Model-based Policy Optimization with Unsupervised Model Adaptation

被引:0
|
作者
Shen, Jian [1 ]
Zhao, Han [2 ,3 ]
Zhang, Weinan [1 ]
Yu, Yong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] D E Shaw & Co, New York, NY USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Unsupervised Music Genre Classification with a Model-Based Approach
    Barreira, Luis
    Cavaco, Sofia
    da Silva, Joaquim Ferreira
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BOOK, 2011, 7026 : 268 - 281
  • [42] Model-Based Curve Clustering Using Unsupervised Learning
    Pesout, Pavel
    APPLICATIONS OF MATHEMATICS AND STATISTICS IN ECONOMY: AMSE 2009, 2009, : 361 - 369
  • [43] Model-based automatic neighborhood design by unsupervised learning
    Ghiani, Gianpaolo
    Laporte, Gilbert
    Manni, Emanuele
    COMPUTERS & OPERATIONS RESEARCH, 2015, 54 : 108 - 116
  • [44] Unsupervised Acoustic Model Adaptation Based on Ensemble Methods
    Shinozaki, Takahiro
    Kubota, Yu
    Furui, Sadaoki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 1007 - 1015
  • [45] Unsupervised language model adaptation
    Bacchiani, M
    Roark, B
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 224 - 227
  • [46] Federated Adaptation for Foundation Model-based Recommendations
    Zhang, Chunxu
    Long, Guodong
    Guo, Hongkuan
    Fang, Xiao
    Song, Yang
    Liu, Zhaojie
    Zhou, Guorui
    Zhang, Zijian
    Liu, Yang
    Yang, Bo
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5453 - 5461
  • [47] Model-based adaptation of behavioral mismatching components
    Canal, Carlos
    Poizat, Pascal
    Salaun, Gwen
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (04) : 546 - 563
  • [48] Model-Based Test Adaptation for Smart TVs
    Firat, Atil
    Azimi, Mohammad Yusaf
    Elgun, Celal Cagin
    Erata, Ferhat
    Yilmaz, Cemal
    3RD ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST (AST 2022), 2022, : 52 - 53
  • [49] Model-based analysis of dynamics in vergence adaptation
    Yuan, WH
    Semmlow, JL
    Muller-Munoz, P
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2001, 48 (12) : 1402 - 1411
  • [50] Model-Based Metacontrol for Self-adaptation
    Hernandez, Carlos
    Fernandez, Jose L.
    Sanchez-Escribano, Guadalupe
    Bermejo-Alonso, Julita
    Sanz, Ricardo
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2015, PT I, 2015, 9244 : 643 - 654