Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

被引:0
|
作者
Byravan, Arunkumar [2 ]
Springenberg, Jost Tobias [1 ]
Abdolmaleki, Abbas [1 ]
Hafner, Roland [1 ]
Neunert, Michael [1 ]
Lampe, Thomas [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
[2] Univ Washington, Seattle, WA 98195 USA
来源
关键词
RL; Model-Based RL; Transfer Learning; Visuomotor Control;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
    Tangkaratt, Voot
    Mod, Syogo
    Zhao, Tingting
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL NETWORKS, 2014, 57 : 128 - 140
  • [42] Offline Model-Based Optimization via Policy-Guided Gradient Search
    Chemingui, Yassine
    Deshwal, Aryan
    Hoang, Trong Nghia
    Doppa, Janardhan Rao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11230 - 11239
  • [43] Model-based policy optimization algorithms for feedback control of complex dynamic systems
    Yerimah, Lucky E.
    Jorgensen, Christian
    Bequette, B. Wayne
    COMPUTERS & CHEMICAL ENGINEERING, 2025, 195
  • [44] Model-Based Multi-agent Policy Optimization with Dynamic Dependence Modeling
    Hu, Biyang
    Yu, Chao
    Wu, Zifan
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 396 - 411
  • [45] Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics
    Sassano, Mario
    Mylvaganam, Thulasi
    Astolfi, Alessandro
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 2683 - 2698
  • [46] Off-policy Model-based Learning under Unknown Factored Dynamics
    Hallak, Assaf
    Schnitzler, Francois
    Mann, Timothy
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 711 - 719
  • [47] On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization
    Kim, Jungtaek
    Choi, Seungjin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [48] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [49] Model-based policy analysis - Introduction
    Bunn, DW
    Larsen, ER
    Vlahos, K
    ENERGY POLICY, 1997, 25 (03) : 271 - 272
  • [50] Model-based transportation policy analysis
    Mirchandani, Pitu B.
    Head, K. Larry
    Boyce, David
    International Journal of Technology Management, 2000, 19 (03) : 507 - 531