Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：1

作者：

Saglam, Baturay ^{[1
]}

Mutlu, Furkan Burak ^{[1
]}

Cicek, Dogan Can ^{[1
]}

Kozat, Suleyman Serdar ^{[1
]}

机构：

[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Bilkent, Ankara, Turkiye

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

10.1007/s11063-024-11461-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

页数：25

共 50 条

[1] Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
Baturay Saglam
Furkan Burak Mutlu
Dogan Can Cicek
Suleyman Serdar Kozat
Neural Processing Letters, 56
[2] Parameter-Free On-line Deep Learning
Wawrzynski, Pawel
AUTOMATION 2017: INNOVATIONS IN AUTOMATION, ROBOTICS AND MEASUREMENT TECHNIQUES, 2017, 550 : 543 - 553
[3] Learning to Pour using Deep Deterministic Policy Gradients
Do, Chau
Gordillo, Camilo
Burgard, Wolfram
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3074 - 3079
[4] Parameter-free Locally Accelerated Conditional Gradients
Carderera, Alejandro
Diakonikolas, Jelena
Lin, Cheuk Yin
Pokutta, Sebastian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Automatic VMAT Machine Parameter Optimization Using Deep Deterministic Policy Gradients
Hrinivich, W.
Li, H.
Lee, J.
MEDICAL PHYSICS, 2022, 49 (06) : E117 - E117
[6] Deep Deterministic Policy Gradients with Transfer Learning Framework in StarCraft Micromanagement
Xie, Dong
Zhong, Xiangnan
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 410 - 415
[7] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Wu, Junta
Li, Huiyun
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
[8] Selective Catalytic Reduction System Ammonia Injection Control Based on Deep Deterministic Policy Reinforcement Learning
Xie, Peiran
Zhang, Guangming
Niu, Yuguang
Sun, Tianshu
FRONTIERS IN ENERGY RESEARCH, 2021, 9
[9] A parameter-free learning automaton scheme
Ren, Xudie
Li, Shenghong
Ge, Hao
FRONTIERS IN NEUROROBOTICS, 2022, 16
[10] Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
Zhan, Ming
Fan, Jingjing
Guo, Jianying
IEEE ACCESS, 2023, 11 : 87732 - 87746

← 1 2 3 4 5 →