Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

被引:3
|
作者
Chen, Yurou [1 ,2 ]
Zhang, Fengyi [1 ,2 ]
Liu, Zhiyong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China
关键词
Reinforcement Learning; Policy gradient; Actor-critic; Value function; Bias-variance trade-off;
D O I
10.1016/j.neunet.2023.10.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
引用
收藏
页码:764 / 777
页数:14
相关论文
共 47 条
  • [41] Does consideration of larger study areas yield more accurate estimates of air pollution health effects? An illustration of the bias-variance trade-off in air pollution epidemiology
    Pedersen, Marie
    Siroux, Valerie
    Pin, Isabelle
    Charles, Marie Aline
    Forhan, Anne
    Hulin, Agnes
    Galineau, Julien
    Lepeule, Johanna
    Giorgis-Allemand, Lise
    Sunyer, Jordi
    Annesi-Maesano, Isabella
    Slama, Remy
    ENVIRONMENT INTERNATIONAL, 2013, 60 : 23 - 30
  • [42] A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment
    Christopher M. Glaze
    Alexandre L. S. Filipowicz
    Joseph W. Kable
    Vijay Balasubramanian
    Joshua I. Gold
    Nature Human Behaviour, 2018, 2 : 213 - 224
  • [43] Cooperative traffic signal control using Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph algorithm
    Yang, Shantian
    Yang, Bo
    Wong, Hau-San
    Kang, Zhongfeng
    KNOWLEDGE-BASED SYSTEMS, 2019, 183
  • [44] Adaptive parameter control of evolutionary algorithms to improve quality-time trade-off
    Aine, Sandip
    Kumar, Rajeev
    Chakrabarti, P. P.
    APPLIED SOFT COMPUTING, 2009, 9 (02) : 527 - 540
  • [45] The Bias/Variance Trade-Off When Estimating the MR Signal Magnitude From the Complex Average of Repeated Measurements
    Tisdall, M. Dylan
    Lockhart, Richard A.
    Atkins, M. Stella
    MAGNETIC RESONANCE IN MEDICINE, 2011, 66 (05) : 1456 - 1467
  • [46] Integrating asynchronous advantage actor-critic (A3C) and coalitional game theory algorithms for optimizing energy, carbon emissions, and reliability of scientific workflows in cloud data centers
    Khaleel, Mustafa Ibrahim
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 92
  • [47] Exploiting gene-environment independence for analysis of case-control studies: An empirical bayes-type shrinkage estimator to trade-off between bias and efficiency
    Mukherjee, Bhramar
    Chatterjee, Nilanjan
    BIOMETRICS, 2008, 64 (03) : 685 - 694