Efficient data use in incremental actor-critic algorithms

被引:7
|
作者
Cheng, Yuhu [1 ]
Feng, Huanting [1 ]
Wang, Xuesong [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Elect Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
Actor-critic; Reinforcement learning; Incremental least-squares temporal difference; Recursive least-squares temporal difference; Policy evaluation; Function approximation;
D O I
10.1016/j.neucom.2011.11.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) reinforcement learning methods are on-line approximations to policy iterations and have wide application in solving large-scale Markov decision and high-dimensional learning control problems. In order to overcome data inefficiency of incremental AC algorithms based on temporal difference (AC-TD), two new incremental AC algorithms (i.e., AC-RLSTD and AC-iLSTD) are proposed by applying a recursive least-squares TD (RLSTD(lambda)) algorithm and an incremental least-squares TO (iLSTD(lambda)) algorithm to the Critic evaluation, which can make more efficient use of data than TD. The Critic estimates a value-function using the RLSTD(lambda) or iLSTD(lambda) algorithm and the Actor updates the policy based on a regular gradient obtained by the TD error. The improvement in learning evaluation efficiency of the Critic will contribute to the improvement in policy learning performance of the Actor. Simulation results on the learning control of an inverted pendulum and a mountain-car problem illustrate the effectiveness of the two proposed AC algorithms in comparison to the AC-TD algorithm. In addition the AC-iLSTD, using a greedy selection mechanism, can perform much better than the AC-iLSTD using a random selection mechanism. In the simulation, the effect of different parameter settings of the eligibility trace on the learning performance of AC algorithms is analyzed. Furthermore, it is found that different initial values of the variance matrix in the AC-RLSTD algorithm should be chosen appropriately to obtain better performance for different learning problems. Crown Copyright (C) 2012 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:346 / 354
页数:9
相关论文
共 50 条
  • [41] Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States
    Banerjee, Chayan
    Chen, Zhiyong
    Noman, Nasimul
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7009 - 7014
  • [42] Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms
    Zheng, Liyuan
    Fiez, Tanner
    Alumbaugh, Zane
    Chasnov, Benjamin
    Ratliff, Lillian J.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9217 - 9224
  • [43] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [44] Better Exploration with Optimistic Actor-Critic
    Ciosek, Kamil
    Quan Vuong
    Loftin, Robert
    Hofmann, Katja
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [45] Twin Delayed Hierarchical Actor-Critic
    Anca, Mihai
    Studley, Matthew
    2021 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ROBOTICS AND APPLICATIONS (ICARA 2021), 2021, : 221 - 225
  • [46] Generative Adversarial Soft Actor-Critic
    Hwang, Hyo-Seok
    Kim, Yoojoong
    Seok, Junhee
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [47] Robust Actor-Critic With Relative Entropy Regulating Actor
    Cheng, Yuhu
    Huang, Longyang
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9054 - 9063
  • [48] An Actor-Critic Algorithm for SVM Hyperparameters
    Kim, Chayoung
    Park, Jung-min
    Kim, Hye-young
    INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 653 - 661
  • [49] Offline-Online Actor-Critic
    Wang X.
    Hou D.
    Huang L.
    Cheng Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 61 - 69
  • [50] Actor-Critic Model Predictive Control
    Romero, Angel
    Song, Yunlong
    Scaramuzza, Davide
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 14777 - 14784