Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

被引:0
|
作者
Dong, Kun [1 ,2 ]
Luo, Yongle [1 ,2 ]
Wang, Yuxin [1 ,2 ]
Liu, Yu [1 ,2 ]
Qu, Chengeng [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Cheng, Erkang [1 ,2 ]
Sun, Zhiyong [1 ,2 ]
Song, Bo [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, HFIPS, Hefei, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei, Peoples R China
关键词
Reinforcement learning; Robotics; Data efficiency; ALGORITHMS;
D O I
10.1016/j.knosys.2024.111428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates modelfree policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
    Gao, Cheng
    Wang, Dan
    JOURNAL OF BUILDING ENGINEERING, 2023, 74
  • [32] Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment
    Liu, Xiangyu
    Yu, Chao
    Huang, Qikai
    Wang, Luhao
    Wu, Jianfeng
    Guan, Xiangdong
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 105 - 117
  • [33] Selective Dyna-Style Planning Under Limited Model Capacity
    Abbas, Zaheer
    Sokota, Samuel
    Talvitie, Erin J.
    White, Martha
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [34] Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning
    McDannald, Michael A.
    Lucantonio, Federica
    Burke, Kathryn A.
    Niv, Yael
    Schoenbaum, Geoffrey
    JOURNAL OF NEUROSCIENCE, 2011, 31 (07): : 2700 - 2705
  • [35] Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space
    Wellmer, Zac
    Kwok, James T.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 118 - 133
  • [36] Combining Model-Based Design and Model-Free Policy Optimization to Learn Safe, Stabilizing Controllers
    Westenbroek, Tyler
    Agrawal, Ayush
    Castaneda, Fernando
    Sastry, S. Shankar
    Sreenath, Koushil
    IFAC PAPERSONLINE, 2021, 54 (05): : 19 - 24
  • [37] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [38] Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
    Zhang, Shenao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning
    Liu, Yang
    Hofert, Marius
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2791 - 2797
  • [40] Model-based learning retrospectively updates model-free values
    Max Doody
    Maaike M. H. Van Swieten
    Sanjay G. Manohar
    Scientific Reports, 12