Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

被引：3

作者：

Chen, Yurou ^{[1
,2
]}

Zhang, Fengyi ^{[1
,2
]}

Liu, Zhiyong ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 169卷

关键词：

Reinforcement Learning; Policy gradient; Actor-critic; Value function; Bias-variance trade-off;

D O I：

10.1016/j.neunet.2023.10.023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).

引用

页码：764 / 777

页数：14

共 47 条

[1] Adaptive Advantage Estimation for Actor-Critic Algorithms
Chen, Yurou
Zhang, Fengyi
Liu, Zhiyong
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[2] Bias-Variance Trade-Off in Continuous Test Norming
Voncken, Lieke
Albers, Casper J.
Timmerman, Marieke E.
ASSESSMENT, 2021, 28 (08) : 1932 - 1948
[3] Actor-Critic Algorithms for Variance Minimization
Awate, Yogesh P.
TECHNOLOGICAL DEVELOPMENTS IN EDUCATION AND AUTOMATION, 2010, : 455 - 460
[4] Bias in Natural Actor-Critic Algorithms
Thomas, Philip S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[5] Bias-variance trade-off for prequential model list selection
Ernest Fokoue
Bertrand Clarke
Statistical Papers, 2011, 52 : 813 - 833
[6] Bias-Variance Trade-Off and Shrinkage of Weights in Forecast Combination
Blanc, Sebastian M.
Setzer, Thomas
MANAGEMENT SCIENCE, 2020, 66 (12) : 5720 - 5737
[7] A closer look at the bias-variance trade-off in multivariate calibration
Faber, NM
JOURNAL OF CHEMOMETRICS, 1999, 13 (02) : 185 - 192
[8] Bias-variance trade-off for prequential model list selection
Fokoue, Ernest
Clarke, Bertrand
STATISTICAL PAPERS, 2011, 52 (04) : 813 - 833
[9] Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
Amin, Kareem
Kulesza, Alex
Medina, Andres Murioz
Vassilvitskii, Sergei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[10] Multiclass Learning with Margin: Exponential Rates with No Bias-Variance Trade-Off
Vigogna, Stefano
Meanti, Giacomo
De Vito, Ernesto
Rosasco, Lorenzo
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →