Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:1
|
作者
Saglam, Baturay [1 ]
Mutlu, Furkan Burak [1 ]
Cicek, Dogan Can [1 ]
Kozat, Suleyman Serdar [1 ]
机构
[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Bilkent, Ankara, Turkiye
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
10.1007/s11063-024-11461-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Coin Betting and Parameter-Free Online Learning
    Orabona, Francesco
    Pal, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [22] Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV
    Ma, Yunhong
    Bai, Shuyao
    Zhao, Yifei
    Song, Chao
    Yang, Jie
    16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 789 - 794
  • [23] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
    Chen, Miao
    Li, Wenna
    Fei, Shihan
    Wei, Yufei
    Tu, Mingyang
    Li, Jiangbo
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204
  • [24] Parameter-Free Deterministic Global Search with Simplified Central Force Optimization
    Formato, Richard A.
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 309 - 318
  • [25] WD3: Taming the Estimation Bias in Deep Reinforcement Learning
    He, Qiang
    Hou, Xinwen
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 391 - 398
  • [26] The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
    Lehmann, Matthias
    arXiv, 1600,
  • [27] Deep parameter-free attention hashing for image retrieval
    Wenjing Yang
    Liejun Wang
    Shuli Cheng
    Scientific Reports, 12
  • [28] A Hierarchical Deep Deterministic Policy Gradients for Swarm Navigation
    Hung The Nguyen
    Tung Nguyen
    Do-Van Nguyen
    Thanh-Ha Le
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 12 - 18
  • [29] Learning Resource Scheduling with High Priority Users using Deep Deterministic Policy Gradients
    Gracla, Steffen
    Beck, Edgar
    Bockelmann, Carsten
    Dekorsy, Armin
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4480 - 4485
  • [30] Parameter-Free Density Estimation for Hyperspectral Image Clustering
    Le Moan, Steven
    Cariou, Claude
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,