Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引：1

作者：

Jiang, Haobo ^{[1
]}

Qian, Jianjun ^{[1
]}

Xie, Jin ^{[1
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I | 2018年 / 11256卷

关键词：

Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;

D O I：

10.1007/978-3-030-03398-9_48

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.

引用

页码：562 / 573

页数：12

共 45 条

[21] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
Stankovic, Milos S.
Beko, Marko
Ilic, Nemanja
Stankovic, Srdjan S.
EUROPEAN JOURNAL OF CONTROL, 2023, 74
[22] Finite-Sample Analysis of Off-Policy Natural Actor-Critic With Linear Function Approximation
Chen, Zaiwei
Khodadadian, Sajad
Maguluri, Siva Theja
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2611 - 2616
[23] Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Duan, Jingliang
Guan, Yang
Li, Shengbo Eben
Ren, Yangang
Sun, Qi
Cheng, Bo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6584 - 6598
[24] Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
Tanabe, Takumi
Sato, Rei
Fukuchi, Kazuto
Sakuma, Jun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[25] Memory-based soft actor-critic with prioritized experience replay for autonomous navigation
Wei, Zhigang
Xiao, Wendong
Yuan, Liang
Ran, Teng
Cui, Jianping
Lv, Kai
INTELLIGENT SERVICE ROBOTICS, 2024, 17 (03) : 621 - 630
[26] Off-policy actor-critic deep reinforcement learning methods for alert prioritization in intrusion detection systems
Chavali, Lalitha
Krishnan, Abhinav
Saxena, Paresh
Mitra, Barsha
Chivukula, Aneesh Sreevallabh
COMPUTERS & SECURITY, 2024, 142
[27] Cooperative traffic signal control using Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph algorithm
Yang, Shantian
Yang, Bo
Wong, Hau-San
Kang, Zhongfeng
KNOWLEDGE-BASED SYSTEMS, 2019, 183
[28] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
Skach, Jan
Kiumarsi, Bahare
Lewis, Frank L.
Straka, Ondrej
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
[29] Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic
Ueno, Tsuyoshi
Nakamura, Yutaka
Takuma, Takashi
Shibata, Tomohiro
Hosoda, Koh
Ishii, Shin
2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 5226 - +
[30] Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning
Zhou, Chengmin
Huang, Bingding
Hassan, Haseeb
Franti, Pasi
JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (01) : 151 - 180

← 1 2 3 4 5 →