Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

被引：0

作者：

Huang, Wenzhen ^{[1
,2
]}

Yin, Qiyue ^{[1
,2
]}

Zhang, Junge ^{[1
,2
]}

Huang, Kaiqi ^{[1
,2
,3
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, CRISE, Beijing, Peoples R China

[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model-based reinforcement learning (RL) is more sample efficient than model-free RL by using imaginary trajectories generated by the learned dynamics model. When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. To alleviate such problem, this paper proposes to adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. More specifically, we evaluate the effect of an imaginary transition by calculating the change of the loss computed on the real samples when we use the transition to train the action-value and policy functions. Based on this evaluation criterion, we construct the idea of reweighting each imaginary transition by a well-designed meta-gradient algorithm. Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. Visualization of our changing weights further validates the necessity of utilizing reweight scheme.

引用

页码：7848 / 7856

页数：9

共 50 条

[41] Model-based reinforcement learning with model error and its application
Tajima, Yoshiyuki
Onisawa, Takehisa
PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 1333 - 1336
[42] Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
Agarwal, Alekh
Kakade, Sham
Yang, Lin F.
CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
[43] Reward Shaping for Model-Based Bayesian Reinforcement Learning
Kim, Hyeoneun
Lim, Woosang
Lee, Kanghoon
Noh, Yung-Kyun
Kim, Kee-Eung
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
[44] Model-based Adversarial Meta-Reinforcement Learning
Lin, Zichuan
Thomas, Garrett
Yang, Guangwen
Ma, Tengyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[45] On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Zhang, Baohe
Rajan, Raghu
Pineda, Luis
Lambert, Nathan
Biedenkapp, Andre
Chua, Kurtland
Hutter, Frank
Calandra, Roberto
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[46] Efficient reinforcement learning: Model-based acrobot control
Boone, G
1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 229 - 234
[47] Multiple model-based reinforcement learning for nonlinear control
Samejima, K
Katagiri, K
Doya, K
Kawato, M
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (09): : 54 - 69
[48] Model-based reinforcement learning for approximate optimal regulation
Kamalapurkar, Rushikesh
Walters, Patrick
Dixon, Warren E.
AUTOMATICA, 2016, 64 : 94 - 104
[49] Model-based Bayesian Reinforcement Learning for Dialogue Management
Lison, Pierre
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 475 - 479
[50] Model-based Lifelong Reinforcement Learning with Bayesian Exploration
Fu, Haotian
Yu, Shangqun
Littman, Michael
Konidaris, George
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →