Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引:1
|
作者
Jiang, Haobo [1 ]
Qian, Jianjun [1 ]
Xie, Jin [1 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China
关键词
Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;
D O I
10.1007/978-3-030-03398-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.
引用
收藏
页码:562 / 573
页数:12
相关论文
共 45 条
  • [1] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Generalized Off-Policy Actor-Critic
    Zhang, Shangtong
    Boehmer, Wendelin
    Whiteson, Shimon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [4] Meta attention for Off-Policy Actor-Critic
    Huang, Jiateng
    Huang, Wanrong
    Lan, Long
    Wu, Dan
    NEURAL NETWORKS, 2023, 163 : 86 - 96
  • [5] Off-Policy Actor-Critic with Emphatic Weightings
    Graves, Eric
    Imani, Ehsan
    Kumaraswamy, Raksha
    White, Martha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [6] Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
    Diddigi, Raghuram Bharadwaj
    Jain, Prateek
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] Variance Penalized On-Policy and Off-Policy Actor-Critic
    Jain, Arushi
    Patil, Gandharv
    Jain, Ayush
    Khetarpa, Khimya
    Precup, Doina
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
  • [8] Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
    Khodadadian, Sajad
    Chen, Zaiwei
    Maguluri, Siva Theja
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
    Xu, Tengyu
    Yang, Zhuoran
    Wang, Zhaoran
    Liang, Yingbin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679