Efficient data use in incremental actor-critic algorithms

被引:7
|
作者
Cheng, Yuhu [1 ]
Feng, Huanting [1 ]
Wang, Xuesong [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Elect Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Actor-critic; Reinforcement learning; Incremental least-squares temporal difference; Recursive least-squares temporal difference; Policy evaluation; Function approximation;
D O I
10.1016/j.neucom.2011.11.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) reinforcement learning methods are on-line approximations to policy iterations and have wide application in solving large-scale Markov decision and high-dimensional learning control problems. In order to overcome data inefficiency of incremental AC algorithms based on temporal difference (AC-TD), two new incremental AC algorithms (i.e., AC-RLSTD and AC-iLSTD) are proposed by applying a recursive least-squares TD (RLSTD(lambda)) algorithm and an incremental least-squares TO (iLSTD(lambda)) algorithm to the Critic evaluation, which can make more efficient use of data than TD. The Critic estimates a value-function using the RLSTD(lambda) or iLSTD(lambda) algorithm and the Actor updates the policy based on a regular gradient obtained by the TD error. The improvement in learning evaluation efficiency of the Critic will contribute to the improvement in policy learning performance of the Actor. Simulation results on the learning control of an inverted pendulum and a mountain-car problem illustrate the effectiveness of the two proposed AC algorithms in comparison to the AC-TD algorithm. In addition the AC-iLSTD, using a greedy selection mechanism, can perform much better than the AC-iLSTD using a random selection mechanism. In the simulation, the effect of different parameter settings of the eligibility trace on the learning performance of AC algorithms is analyzed. Furthermore, it is found that different initial values of the variance matrix in the AC-RLSTD algorithm should be chosen appropriately to obtain better performance for different learning problems. Crown Copyright (C) 2012 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:346 / 354
页数:9
相关论文
共 50 条
  • [31] Actor-Critic Instance Segmentation
    Araslanov, Nikita
    Rothkopf, Constantin A.
    Roth, Stefan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
  • [32] Efficient Actor-critic Algorithm with Dual Piecewise Model Learning
    Zhong, Shan
    Liu, Quan
    Gong, Shengrong
    Fu, Qiming
    Xu, Jin
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 823 - 830
  • [33] Actor-Critic or Critic-Actor? A Tale of Two Time Scales
    Bhatnagar, Shalabh
    Borkar, Vivek S.
    Guin, Soumyajit
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 2671 - 2676
  • [34] Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay
    Tasfi, Norman
    Capretz, Miriam
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [35] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
    Li, Shuang
    Yan, Yanghui
    Ren, Ju
    Zhou, Yuezhi
    Zhang, Yaoxue
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (01) : 89 - 96
  • [36] A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
    LI Shuang
    YAN Yanghui
    REN Ju
    ZHOU Yuezhi
    ZHANG Yaoxue
    ChineseJournalofElectronics, 2020, 29 (01) : 89 - 96
  • [37] An actor-critic strategy for a safe and efficient human robot collaboration
    Gabrielli, Guglielmo
    Secchi, Cristian
    2021 20TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2021, : 919 - 926
  • [38] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [39] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [40] A constrained optimization perspective on actor-critic algorithms and application to network routing
    Prashanth, L. A.
    Prasad, H. L.
    Bhatnagar, Shalabh
    Chandra, Prakash
    SYSTEMS & CONTROL LETTERS, 2016, 92 : 46 - 51