Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

被引:0
|
作者
Byravan, Arunkumar [2 ]
Springenberg, Jost Tobias [1 ]
Abdolmaleki, Abbas [1 ]
Hafner, Roland [1 ]
Neunert, Michael [1 ]
Lampe, Thomas [1 ]
Siegel, Noah [1 ]
Heess, Nicolas [1 ]
Riedmiller, Martin [1 ]
机构
[1] DeepMind, London, England
[2] Univ Washington, Seattle, WA 98195 USA
来源
关键词
RL; Model-Based RL; Transfer Learning; Visuomotor Control;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [2] Policy Optimization with Model-Based Explorations
    Pan, Feiyang
    Cai, Qingpeng
    Zeng, An-Xiang
    Pan, Chun-Xiang
    Da, Qing
    He, Hualin
    He, Qing
    Tang, Pingzhong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4675 - 4682
  • [3] Bidirectional Model-based Policy Optimization
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Yu, Yong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Variational Model-based Policy Optimization
    Chow, Yinlam
    Cui, Brandon
    Ryu, MoonKyung
    Ghavamzadeh, Mohammad
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2292 - 2299
  • [5] Latent Causal Dynamics Model for Model-Based Reinforcement Learning
    Hao, Zhifeng
    Zhu, Haipeng
    Chen, Wei
    Cai, Ruichu
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 219 - 230
  • [6] Denoising Diffusion Models on Model-Based Latent Space
    Scribano, Carmelo
    Pezzi, Danilo
    Franchini, Giorgia
    Prato, Marco
    ALGORITHMS, 2023, 16 (11)
  • [7] Model-based Policy Optimization with Unsupervised Model Adaptation
    Shen, Jian
    Zhao, Han
    Zhang, Weinan
    Yu, Yong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Proximal policy optimization with model-based methods
    Li, Shuailong
    Zhang, Wei
    Zhang, Huiwen
    Zhang, Xin
    Leng, Yuquan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5399 - 5410
  • [9] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Adaptation Augmented Model-based Policy Optimization
    Shen, Jian
    Lai, Hang
    Liu, Minghuan
    Zhao, Han
    Yu, Yong
    Zhang, Weinan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24