Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引：1

作者：

Jiang, Haobo ^{[1
]}

Qian, Jianjun ^{[1
]}

Xie, Jin ^{[1
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I | 2018年 / 11256卷

关键词：

Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;

D O I：

10.1007/978-3-030-03398-9_48

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.

引用

页码：562 / 573

页数：12

共 45 条

[1] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
Tasfi, Norman
Capretz, Miriam
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[2] Generalized Off-Policy Actor-Critic
Zhang, Shangtong
Boehmer, Wendelin
Whiteson, Shimon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[4] Meta attention for Off-Policy Actor-Critic
Huang, Jiateng
Huang, Wanrong
Lan, Long
Wu, Dan
NEURAL NETWORKS, 2023, 163 : 86 - 96
[5] Off-Policy Actor-Critic with Emphatic Weightings
Graves, Eric
Imani, Ehsan
Kumaraswamy, Raksha
White, Martha
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[6] Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Diddigi, Raghuram Bharadwaj
Jain, Prateek
Prabuchandran, K. J.
Bhatnagar, Shalabh
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[7] Variance Penalized On-Policy and Off-Policy Actor-Critic
Jain, Arushi
Patil, Gandharv
Jain, Ayush
Khetarpa, Khimya
Precup, Doina
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7899 - 7907
[8] Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Khodadadian, Sajad
Chen, Zaiwei
Maguluri, Siva Theja
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Xu, Tengyu
Yang, Zhuoran
Wang, Zhaoran
Liang, Yingbin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679

← 1 2 3 4 5 →