Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:1
|
作者
Saglam, Baturay [1 ]
Mutlu, Furkan Burak [1 ]
Cicek, Dogan Can [1 ]
Kozat, Suleyman Serdar [1 ]
机构
[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Bilkent, Ankara, Turkiye
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
10.1007/s11063-024-11461-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
    Baturay Saglam
    Furkan Burak Mutlu
    Dogan Can Cicek
    Suleyman Serdar Kozat
    Neural Processing Letters, 56
  • [2] Parameter-Free On-line Deep Learning
    Wawrzynski, Pawel
    AUTOMATION 2017: INNOVATIONS IN AUTOMATION, ROBOTICS AND MEASUREMENT TECHNIQUES, 2017, 550 : 543 - 553
  • [3] Learning to Pour using Deep Deterministic Policy Gradients
    Do, Chau
    Gordillo, Camilo
    Burgard, Wolfram
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3074 - 3079
  • [4] Parameter-free Locally Accelerated Conditional Gradients
    Carderera, Alejandro
    Diakonikolas, Jelena
    Lin, Cheuk Yin
    Pokutta, Sebastian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Automatic VMAT Machine Parameter Optimization Using Deep Deterministic Policy Gradients
    Hrinivich, W.
    Li, H.
    Lee, J.
    MEDICAL PHYSICS, 2022, 49 (06) : E117 - E117
  • [6] Deep Deterministic Policy Gradients with Transfer Learning Framework in StarCraft Micromanagement
    Xie, Dong
    Zhong, Xiangnan
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 410 - 415
  • [7] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
    Wu, Junta
    Li, Huiyun
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [8] Selective Catalytic Reduction System Ammonia Injection Control Based on Deep Deterministic Policy Reinforcement Learning
    Xie, Peiran
    Zhang, Guangming
    Niu, Yuguang
    Sun, Tianshu
    FRONTIERS IN ENERGY RESEARCH, 2021, 9
  • [9] A parameter-free learning automaton scheme
    Ren, Xudie
    Li, Shenghong
    Ge, Hao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [10] Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
    Zhan, Ming
    Fan, Jingjing
    Guo, Jianying
    IEEE ACCESS, 2023, 11 : 87732 - 87746