Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm

被引:1
|
作者
Jiang, Haobo [1 ]
Qian, Jianjun [1 ]
Xie, Jin [1 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing 210094, Peoples R China
关键词
Off-policy actor-critic policy gradient; Tree-backup algorithm; All-action method; Episode-experience replay;
D O I
10.1007/978-3-030-03398-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Off-policy algorithms have played important roles in deep reinforcement learning. Since the off-policy based policy gradient is a biased estimation, the previous works employed importance sampling to achieve the unbiased estimation, where the behavior policy is known in advance. However, it is difficult to choose the reasonable behavior policy for complex agents. Moreover, importance sampling usually produces the large variance. To address these problems, this paper presents a novel actor-critic policy gradient algorithm. Specifically, we employ the tree-backup method in off-policy setting to achieve the unbiased estimation of target policy gradient without using importance sampling. Meanwhile, we combine the naive episode-experience replay and the experience replay to obtain the trajectory samples and reduce the strong correlations between these samples. The experimental results demonstrate the advantages of the proposed method over the competed methods.
引用
收藏
页码:562 / 573
页数:12
相关论文
共 45 条
  • [31] Attention-based advantage actor-critic algorithm with prioritized experience replay for complex 2-D robotic motion planning
    Chengmin Zhou
    Bingding Huang
    Haseeb Hassan
    Pasi Fränti
    Journal of Intelligent Manufacturing, 2023, 34 : 151 - 180
  • [32] A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems
    Sun, Yuezhongyi
    Yang, Boyu
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [33] An inertia wheel pendulum control method based on actor-critic learning algorithm
    Liu Huanlong
    Wang Zhengjie
    Jiang Bin
    Peng Hongyu
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 1281 - 1285
  • [34] Actor-Critic Based Back-off Algorithm for Massive Machine-Type Communication
    Gao, Xin
    Qian, Zhihong
    Xie, Mingtong
    Wang, Xue
    2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 494 - 499
  • [35] Policy Optimization of the Power Allocation Algorithm Based on the Actor-Critic Framework in Small Cell Networks
    Chen, Haibo
    Huang, Zhongwei
    Zhao, Xiaorong
    Liu, Xiao
    Jiang, Youjun
    Geng, Pinyong
    Yang, Guang
    Cao, Yewen
    Wang, Deqiang
    MATHEMATICS, 2023, 11 (07)
  • [36] Energy-efficient train control method based on soft actor-critic algorithm
    Zhu, Q.
    Su, S.
    Tang, T.
    Xiao, X.
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2423 - 2428
  • [37] Automatic Casting Control Method of Continuous Casting Based on Improved Soft Actor-Critic Algorithm
    Wu, Xiaojun
    Jiang, Wenze
    Yuan, Sheng
    Kang, Hongjia
    Gao, Qi
    Mi, Jinzhou
    METALS, 2023, 13 (04)
  • [38] A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots
    Zhao, Tinglong
    Wang, Ming
    Zhao, Qianchuan
    Zheng, Xuehan
    Gao, He
    BIOMIMETICS, 2023, 8 (06)
  • [39] Content Caching Policy for 5G Network Based on Asynchronous Advantage Actor-Critic Method
    Shi, Zhuoyang
    Li, Lixin
    Xu, Yang
    Li, Xu
    Chen, Wei
    Han, Zhu
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [40] Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient
    Inoue, R.
    Watanabe, K.
    Igarashi, H.
    2010 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2010, : 795 - 800