Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引：1

作者：

Jiang, Haobo ^{[1
]}

Qian, Jianjun ^{[1
]}

Xie, Jin ^{[1
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I | 2018年 / 11256卷

关键词：

Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;

D O I：

10.1007/978-3-030-03398-9_48

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.

引用

页码：562 / 573

页数：12

共 45 条

[31] Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning
Chengmin Zhou
Bingding Huang
Haseeb Hassan
Pasi Fränti
Journal of Intelligent Manufacturing, 2023, 34 : 151 - 180
[32] A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems
Sun, Yuezhongyi
Yang, Boyu
PEERJ COMPUTER SCIENCE, 2024, 10
[33] An inertia wheel pendulum control method based on actor-critic learning algorithm
Liu Huanlong
Wang Zhengjie
Jiang Bin
Peng Hongyu
2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 1281 - 1285
[34] Actor-Critic Based Back-off Algorithm for Massive Machine-Type Communication
Gao, Xin
Qian, Zhihong
Xie, Mingtong
Wang, Xue
2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 494 - 499
[35] Policy Optimization of the Power Allocation Algorithm Based on the Actor-Critic Framework in Small Cell Networks
Chen, Haibo
Huang, Zhongwei
Zhao, Xiaorong
Liu, Xiao
Jiang, Youjun
Geng, Pinyong
Yang, Guang
Cao, Yewen
Wang, Deqiang
MATHEMATICS, 2023, 11 (07)
[36] Energy-efficient train control method based on soft actor-critic algorithm
Zhu, Q.
Su, S.
Tang, T.
Xiao, X.
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2423 - 2428
[37] Automatic Casting Control Method of Continuous Casting Based on Improved Soft Actor-Critic Algorithm
Wu, Xiaojun
Jiang, Wenze
Yuan, Sheng
Kang, Hongjia
Gao, Qi
Mi, Jinzhou
METALS, 2023, 13 (04)
[38] A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots
Zhao, Tinglong
Wang, Ming
Zhao, Qianchuan
Zheng, Xuehan
Gao, He
BIOMIMETICS, 2023, 8 (06)
[39] Content Caching Policy for 5G Network Based on Asynchronous Advantage Actor-Critic Method
Shi, Zhuoyang
Li, Lixin
Xu, Yang
Li, Xu
Chen, Wei
Han, Zhu
2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
[40] Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient
Inoue, R.
Watanabe, K.
Igarashi, H.
2010 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2010, : 795 - 800

← 1 2 3 4 5 →