Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

被引:0
|
作者
Huang, Wenzhen [1 ,2 ]
Yin, Qiyue [1 ,2 ]
Zhang, Junge [1 ,2 ]
Huang, Kaiqi [1 ,2 ,3 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, CRISE, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based reinforcement learning (RL) is more sample efficient than model-free RL by using imaginary trajectories generated by the learned dynamics model. When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. To alleviate such problem, this paper proposes to adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. More specifically, we evaluate the effect of an imaginary transition by calculating the change of the loss computed on the real samples when we use the transition to train the action-value and policy functions. Based on this evaluation criterion, we construct the idea of reweighting each imaginary transition by a well-designed meta-gradient algorithm. Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. Visualization of our changing weights further validates the necessity of utilizing reweight scheme.
引用
收藏
页码:7848 / 7856
页数:9
相关论文
共 50 条
  • [41] Model-based reinforcement learning with model error and its application
    Tajima, Yoshiyuki
    Onisawa, Takehisa
    PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 2007, : 1333 - 1336
  • [42] Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
    Agarwal, Alekh
    Kakade, Sham
    Yang, Lin F.
    CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
  • [43] Reward Shaping for Model-Based Bayesian Reinforcement Learning
    Kim, Hyeoneun
    Lim, Woosang
    Lee, Kanghoon
    Noh, Yung-Kyun
    Kim, Kee-Eung
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
  • [44] Model-based Adversarial Meta-Reinforcement Learning
    Lin, Zichuan
    Thomas, Garrett
    Yang, Guangwen
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [45] On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
    Zhang, Baohe
    Rajan, Raghu
    Pineda, Luis
    Lambert, Nathan
    Biedenkapp, Andre
    Chua, Kurtland
    Hutter, Frank
    Calandra, Roberto
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [46] Efficient reinforcement learning: Model-based acrobot control
    Boone, G
    1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 229 - 234
  • [47] Multiple model-based reinforcement learning for nonlinear control
    Samejima, K
    Katagiri, K
    Doya, K
    Kawato, M
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (09): : 54 - 69
  • [48] Model-based reinforcement learning for approximate optimal regulation
    Kamalapurkar, Rushikesh
    Walters, Patrick
    Dixon, Warren E.
    AUTOMATICA, 2016, 64 : 94 - 104
  • [49] Model-based Bayesian Reinforcement Learning for Dialogue Management
    Lison, Pierre
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 475 - 479
  • [50] Model-based Lifelong Reinforcement Learning with Bayesian Exploration
    Fu, Haotian
    Yu, Shangqun
    Littman, Michael
    Konidaris, George
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,