Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

被引:0
|
作者
Dong, Kun [1 ,2 ]
Luo, Yongle [1 ,2 ]
Wang, Yuxin [1 ,2 ]
Liu, Yu [1 ,2 ]
Qu, Chengeng [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Cheng, Erkang [1 ,2 ]
Sun, Zhiyong [1 ,2 ]
Song, Bo [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, HFIPS, Hefei, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei, Peoples R China
关键词
Reinforcement learning; Robotics; Data efficiency; ALGORITHMS;
D O I
10.1016/j.knosys.2024.111428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates modelfree policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Model-Based and Model-Free Mechanisms of Human Motor Learning
    Haith, Adrian M.
    Krakauer, John W.
    PROGRESS IN MOTOR CONTROL: NEURAL, COMPUTATIONAL AND DYNAMIC APPROACHES, 2013, 782 : 1 - 21
  • [42] Model-based learning retrospectively updates model-free values
    Doody, Max
    Van Swieten, Maaike M. H.
    Manohar, Sanjay G.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [43] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    Frontiers of Computer Science, 2024, 18
  • [44] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [45] Neurostimulation Reveals Context-Dependent Arbitration Between Model-Based and Model-Free Reinforcement Learning
    Weissengruber, Sebastian
    Lee, Sang Wan
    O'Doherty, John P.
    Ruff, Christian C.
    CEREBRAL CORTEX, 2019, 29 (11) : 4850 - 4862
  • [46] Sliding mode heading control for AUV based on continuous hybrid model-free and model-based reinforcement learning
    Wang, Dianrui
    Shen, Yue
    Wan, Junhe
    Sha, Qixin
    Li, Guangliang
    Chen, Guanzhong
    He, Bo
    APPLIED OCEAN RESEARCH, 2022, 118
  • [47] Impairment of arbitration between model-based and model-free reinforcement learning in obsessive-compulsive disorder
    Ruan, Zhongqiang
    Seger, Carol A.
    Yang, Qiong
    Kim, Dongjae
    Lee, Sang Wan
    Chen, Qi
    Peng, Ziwen
    FRONTIERS IN PSYCHIATRY, 2023, 14
  • [48] Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning
    Konovalov, Arkady
    Krajbich, Ian
    NATURE COMMUNICATIONS, 2016, 7
  • [49] Impact of provoked stress on model-free and model-based reinforcement learning in individuals with alcohol use disorder
    Wyckmans, Florent
    Chatard, Armand
    Kornreich, Charles
    Gruson, Damien
    Jaafari, Nemat
    Noel, Xavier
    ADDICTIVE BEHAVIORS REPORTS, 2024, 20
  • [50] Individual Variation in Model-Free and Model-Based Reinforcement Learning and Methamphetamine-Taking Behavior in Rats
    Groman, Stephanie
    Massi, Bart
    Lee, Daeyeol
    Taylor, Jane
    NEUROPSYCHOPHARMACOLOGY, 2016, 41 : S436 - S436