Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

被引:3
|
作者
Chen, Yurou [1 ,2 ]
Zhang, Fengyi [1 ,2 ]
Liu, Zhiyong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China
关键词
Reinforcement Learning; Policy gradient; Actor-critic; Value function; Bias-variance trade-off;
D O I
10.1016/j.neunet.2023.10.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
引用
收藏
页码:764 / 777
页数:14
相关论文
共 47 条
  • [31] Evaluation of Bias-Variance Trade-Off for Commonly Used Post-Summarizing Normalization Procedures in Large-Scale Gene Expression Studies
    Qiu, Xing
    Hu, Rui
    Wu, Zhixin
    PLOS ONE, 2014, 9 (06):
  • [32] Graphical diagnostics for regression, model determinations with consideration of the bias/variance trade-off
    Green, RL
    Kalivas, JH
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 60 (1-2) : 173 - 188
  • [33] The Bias Variance Trade-Off in Bootstrapped Error Correcting Output Code Ensembles
    Smith, Raymond S.
    Windeatt, Terry
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2009, 5519 : 1 - 10
  • [34] A bias-variance-complexity trade-off framework for complex system modeling
    Yu, Lean
    Lai, Kin Keung
    Wang, Shouyang
    Huang, Wei
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 1, 2006, 3980 : 518 - 527
  • [35] Adaptive fault-tolerant control for spacecraft: A dynamic Stackelberg game approach with advantage actor-critic reinforcement learning
    Meng, Yizhen
    Liu, Chun
    Liu, Yangyang
    Tan, Longyu
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 154
  • [36] Partial integer decorrelation: optimum trade-off between variance reduction and bias amplification
    Henkel, Patrick
    Guenther, Christoph
    JOURNAL OF GEODESY, 2010, 84 (01) : 51 - 63
  • [37] Partial integer decorrelation: optimum trade-off between variance reduction and bias amplification
    Patrick Henkel
    Christoph Günther
    Journal of Geodesy, 2010, 84 : 51 - 63
  • [38] VARIANCE-BIAS TRADE-OFF IN COVARIATE ADJUSTMENT IN THE CONTEXT OF SYNTHETIC CONTROL METHODS
    Ruan, H.
    Springford, A.
    Gupta, A.
    Mackay, E.
    VALUE IN HEALTH, 2022, 25 (12) : S369 - S369
  • [39] Optimizing variance-bias trade-off in the TWANG package for estimation of propensity scores
    Parast L.
    McCaffrey D.F.
    Burgette L.F.
    de la Guardia F.H.
    Golinelli D.
    Miles J.N.V.
    Griffin B.A.
    Health Services and Outcomes Research Methodology, 2017, 17 (3-4) : 175 - 197
  • [40] Evidence of a bias-variance trade off when correcting for bias in Sentinel 2 forest LAI retrievals using radiative transfer models
    Fernandes, Richard
    Djamai, Najib
    Harvey, Kate
    Hong, Gang
    MacDougall, Camryn
    Shah, Hemit
    Sun, Lixin
    REMOTE SENSING OF ENVIRONMENT, 2024, 305