Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

被引:0
|
作者
Byravan, Arunkumar [2 ]
Springenberg, Jost Tobias [1 ]
Abdolmaleki, Abbas [1 ]
Hafner, Roland [1 ]
Neunert, Michael [1 ]
Lampe, Thomas [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
[2] Univ Washington, Seattle, WA 98195 USA
来源
关键词
RL; Model-Based RL; Transfer Learning; Visuomotor Control;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Model-based policymaking or policy-based modelling? How energy models and energy policy interact
    Suesser, Diana
    Ceglarz, Andrzej
    Gaschnig, Hannes
    Stavrakas, Vassilis
    Flamos, Alexandros
    Giannakidis, George
    Lilliestam, Johan
    ENERGY RESEARCH & SOCIAL SCIENCE, 2021, 75
  • [32] Model-based experimental manipulation of probabilistic behavior in interpretable behavioral latent variable models
    Thome, Janine
    Pinger, Mathieu
    Durstewitz, Daniel
    Sommer, Wolfgang H.
    Kirsch, Peter
    Koppe, Georgia
    FRONTIERS IN NEUROSCIENCE, 2023, 16
  • [33] Model-based clustering of microarray expression data via latent Gaussian mixture models
    McNicholas, Paul D.
    Murphy, Thomas Brendan
    BIOINFORMATICS, 2010, 26 (21) : 2705 - 2712
  • [34] The natural course of atopic dermatitis - Model-based clustering by latent class mixture models
    Diepgen, TL
    Kuss, O
    Gromann, C
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2005, 125 (04) : 853 - 853
  • [35] Finite mixture models and model-based clusteringFinite mixture models and model-based clustering
    Melnykov, Volodymyr
    Maitra, Ranjan
    STATISTICS SURVEYS, 2010, 4 : 80 - 116
  • [36] Latent Model-Based Clustering for Biological Discovery
    Bing, Xin
    Bunea, Florentina
    Royer, Martin
    Das, Jishnu
    ISCIENCE, 2019, 14 : 125 - +
  • [37] Population model-based optimization
    Chen, Xi
    Zhou, Enlu
    JOURNAL OF GLOBAL OPTIMIZATION, 2015, 63 (01) : 125 - 148
  • [38] Population model-based optimization
    Xi Chen
    Enlu Zhou
    Journal of Global Optimization, 2015, 63 : 125 - 148
  • [39] MODEL-BASED EVOLUTIONARY OPTIMIZATION
    Wang, Yongqiang
    Fu, Michael C.
    Marcus, Steven I.
    PROCEEDINGS OF THE 2010 WINTER SIMULATION CONFERENCE, 2010, : 1199 - 1210
  • [40] Model-Based Optimization for Robotics
    Mombaur, Katja
    Kheddar, Abderrahmane
    Harada, Kensuke
    Buschmann, Thomas
    Atkeson, Chris
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2014, 21 (03) : 24 - 161