Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引:1
|
作者
Jiang, Haobo [1 ]
Qian, Jianjun [1 ]
Xie, Jin [1 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China
关键词
Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;
D O I
10.1007/978-3-030-03398-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.
引用
收藏
页码:562 / 573
页数:12
相关论文
共 45 条
  • [21] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
    Stankovic, Milos S.
    Beko, Marko
    Ilic, Nemanja
    Stankovic, Srdjan S.
    EUROPEAN JOURNAL OF CONTROL, 2023, 74
  • [22] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
    Chen, Zaiwei
    Khodadadian, Sajad
    Maguluri, Siva Theja
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
  • [23] Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
    Duan, Jingliang
    Guan, Yang
    Li, Shengbo Eben
    Ren, Yangang
    Sun, Qi
    Cheng, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6584 - 6598
  • [24] Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
    Tanabe, Takumi
    Sato, Rei
    Fukuchi, Kazuto
    Sakuma, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [25] Memory-based soft actor-critic with prioritized experience replay for autonomous navigation
    Wei, Zhigang
    Xiao, Wendong
    Yuan, Liang
    Ran, Teng
    Cui, Jianping
    Lv, Kai
    INTELLIGENT SERVICE ROBOTICS, 2024, 17 (03) : 621 - 630
  • [26] Off-policy actor-critic deep reinforcement learning methods for alert prioritization in intrusion detection systems
    Chavali, Lalitha
    Krishnan, Abhinav
    Saxena, Paresh
    Mitra, Barsha
    Chivukula, Aneesh Sreevallabh
    COMPUTERS & SECURITY, 2024, 142
  • [27] Cooperative traffic signal control using Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph algorithm
    Yang, Shantian
    Yang, Bo
    Wong, Hau-San
    Kang, Zhongfeng
    KNOWLEDGE-BASED SYSTEMS, 2019, 183
  • [28] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
    Skach, Jan
    Kiumarsi, Bahare
    Lewis, Frank L.
    Straka, Ondrej
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
  • [29] Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic
    Ueno, Tsuyoshi
    Nakamura, Yutaka
    Takuma, Takashi
    Shibata, Tomohiro
    Hosoda, Koh
    Ishii, Shin
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 5226 - +
  • [30] Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning
    Zhou, Chengmin
    Huang, Bingding
    Hassan, Haseeb
    Franti, Pasi
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (01) : 151 - 180