Efficient data use in incremental actor-critic algorithms

被引:7
|
作者
Cheng, Yuhu [1 ]
Feng, Huanting [1 ]
Wang, Xuesong [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Elect Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Actor-critic; Reinforcement learning; Incremental least-squares temporal difference; Recursive least-squares temporal difference; Policy evaluation; Function approximation;
D O I
10.1016/j.neucom.2011.11.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) reinforcement learning methods are on-line approximations to policy iterations and have wide application in solving large-scale Markov decision and high-dimensional learning control problems. In order to overcome data inefficiency of incremental AC algorithms based on temporal difference (AC-TD), two new incremental AC algorithms (i.e., AC-RLSTD and AC-iLSTD) are proposed by applying a recursive least-squares TD (RLSTD(lambda)) algorithm and an incremental least-squares TO (iLSTD(lambda)) algorithm to the Critic evaluation, which can make more efficient use of data than TD. The Critic estimates a value-function using the RLSTD(lambda) or iLSTD(lambda) algorithm and the Actor updates the policy based on a regular gradient obtained by the TD error. The improvement in learning evaluation efficiency of the Critic will contribute to the improvement in policy learning performance of the Actor. Simulation results on the learning control of an inverted pendulum and a mountain-car problem illustrate the effectiveness of the two proposed AC algorithms in comparison to the AC-TD algorithm. In addition the AC-iLSTD, using a greedy selection mechanism, can perform much better than the AC-iLSTD using a random selection mechanism. In the simulation, the effect of different parameter settings of the eligibility trace on the learning performance of AC algorithms is analyzed. Furthermore, it is found that different initial values of the variance matrix in the AC-RLSTD algorithm should be chosen appropriately to obtain better performance for different learning problems. Crown Copyright (C) 2012 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:346 / 354
页数:9
相关论文
共 50 条
  • [21] Generalizing Soft Actor-Critic Algorithms to Discrete Action Spaces
    Zhang, Le
    Gu, Yong
    Zhao, Xin
    Zhang, Yanshuo
    Zhao, Shu
    Jin, Yifei
    Wu, Xinxin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT 1, 2025, 15031 : 34 - 49
  • [22] Parametrized actor-critic algorithms for finite-horizon MDPs
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 2701 - 2706
  • [23] A Critical Point Analysis of Actor-Critic Algorithms with Neural Networks
    Gottwald, Martin
    Shen, Hao
    Diepold, Klaus
    IFAC PAPERSONLINE, 2022, 55 (15): : 27 - 32
  • [24] Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
    Xu, Tengyu
    Wang, Zhe
    Liang, Yingbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] An Actor-Critic Algorithm With Second-Order Actor and Critic
    Wang, Jing
    Paschalidis, Ioannis Ch.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
  • [26] Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis
    Chen, Ziyi
    Zhou, Yi
    Chen, Rong-Rong
    Zou, Shaofeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] Error controlled actor-critic
    Gao, Xingen
    Chao, Fei
    Zhou, Changle
    Ge, Zhen
    Yang, Longzhi
    Chang, Xiang
    Shang, Changjing
    Shen, Qiang
    INFORMATION SCIENCES, 2022, 612 : 62 - 74
  • [28] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [29] Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
    Su, Pei-Hao
    Budzianowski, Pawel
    Ultes, Stefan
    Gasic, Milica
    Young, Steve
    18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 147 - 157
  • [30] Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
    Zhong, Shan
    Liu, Quan
    Fu, QiMing
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016