Model-Based Reinforcement Learning With Isolated Imaginations

被引：0

作者：

Pan, Minting ^{[1
]}

Zhu, Xiangming ^{[1
]}

Zheng, Yitao ^{[1
]}

Wang, Yunbo ^{[1
]}

Yang, Xiaokang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Decoupled dynamics; model-based reinforcement learning; world model;

D O I：

10.1109/TPAMI.2023.3335263

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios like autonomous driving, noncontrollable dynamics that are independent or sparsely dependent on action signals often exist, making it challenging to learn effective world models. To address this issue, we propose Iso-Dream++, a model-based reinforcement learning approach that has two main contributions. First, we optimize the inverse dynamics to encourage the world model to isolate controllable state transitions from the mixed spatiotemporal variations of the environment. Second, we perform policy optimization based on the decoupled latent imaginations, where we roll out noncontrollable states into the future and adaptively associate them with the current controllable state. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild, such as self-driving cars that can anticipate the movement of other vehicles, thereby avoiding potential risks. On top of our previous work (Pan et al. 2022), we further consider the sparse dependencies between controllable and noncontrollable states, address the training collapse problem of state decoupling, and validate our approach in transfer learning setups. Our empirical study demonstrates that Iso-Dream++ outperforms existing reinforcement learning models significantly on CARLA and DeepMind Control.

引用

页码：2788 / 2803

页数：16

共 50 条

[31] Model gradient: unified model and policy learning in model-based reinforcement learning
Jia, Chengxing
Zhang, Fuxiang
Xu, Tian
Pang, Jing-Cheng
Zhang, Zongzhang
Yu, Yang
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
[32] Model gradient: unified model and policy learning in model-based reinforcement learning
Chengxing Jia
Fuxiang Zhang
Tian Xu
Jing-Cheng Pang
Zongzhang Zhang
Yang Yu
Frontiers of Computer Science, 2024, 18
[33] Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
Ng, Jun Hao Alvin
Petrick, Ronald P. A.
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3195 - 3201
[34] Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning
Huang, Wenzhen
Yin, Qiyue
Zhang, Junge
Huang, Kaiqi
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7848 - 7856
[35] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
Sun, Yuewen
Zhang, Kun
Sun, Changyin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048
[36] Weighted model estimation for offline model-based reinforcement learning
Hishinuma, Toru
Senda, Kei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[37] Latent Causal Dynamics Model for Model-Based Reinforcement Learning
Hao, Zhifeng
Zhu, Haipeng
Chen, Wei
Cai, Ruichu
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 219 - 230
[38] Model-based reinforcement learning with model error and its application
Tajima, Yoshiyuki
Onisawa, Takehisa
PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 1333 - 1336
[39] Model-based reinforcement learning: a computational model and an fMRI study
Yoshida, W
Ishii, S
NEUROCOMPUTING, 2005, 63 : 253 - 269
[40] Reinforcement Twinning: From digital twins to model-based reinforcement learning
Schena, Lorenzo
Marques, Pedro A.
Poletti, Romain
Van den Berghe, Jan
Mendez, Miguel A.
JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 82

← 1 2 3 4 5 →