Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：1

作者：

Saglam, Baturay ^{[1
]}

Mutlu, Furkan Burak ^{[1
]}

Cicek, Dogan Can ^{[1
]}

Kozat, Suleyman Serdar ^{[1
]}

机构：

[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Bilkent, Ankara, Turkiye

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

10.1007/s11063-024-11461-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

页数：25

共 50 条

[21] Coin Betting and Parameter-Free Online Learning
Orabona, Francesco
Pal, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[22] Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV
Ma, Yunhong
Bai, Shuyao
Zhao, Yifei
Song, Chao
Yang, Jie
16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 789 - 794
[23] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
Chen, Miao
Li, Wenna
Fei, Shihan
Wei, Yufei
Tu, Mingyang
Li, Jiangbo
INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204
[24] Parameter-Free Deterministic Global Search with Simplified Central Force Optimization
Formato, Richard A.
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 309 - 318
[25] WD3: Taming the Estimation Bias in Deep Reinforcement Learning
He, Qiang
Hou, Xinwen
2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 391 - 398
[26] The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Lehmann, Matthias
arXiv, 1600,
[27] Deep parameter-free attention hashing for image retrieval
Wenjing Yang
Liejun Wang
Shuli Cheng
Scientific Reports, 12
[28] A Hierarchical Deep Deterministic Policy Gradients for Swarm Navigation
Hung The Nguyen
Tung Nguyen
Do-Van Nguyen
Thanh-Ha Le
PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 12 - 18
[29] Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients
Gracla, Steffen
Beck, Edgar
Bockelmann, Carsten
Dekorsy, Armin
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4480 - 4485
[30] Parameter-Free Density Estimation for Hyperspectral Image Clustering
Le Moan, Steven
Cariou, Claude
2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,

← 1 2 3 4 5 →