Distributional reinforcement learning with epistemic and aleatoric uncertainty estimation

被引:5
|
作者
Liu, Qi [1 ]
Li, Yanjie [1 ]
Chen, Shiyu [1 ]
Lin, Ke [1 ]
Shi, Xiongtao [1 ]
Lou, Yunjiang [1 ]
机构
[1] Harbin Inst Technol, Dept Control Sci & Engn, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributional reinforcement learning; Uncertainty; Risk sensitive policy; Exploration;
D O I
10.1016/j.ins.2023.119217
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributional reinforcement learning (RL) differs from conventional RL, which only estimates the expectation of the return. Distributional RL considers the return as a random variable and estimates its distribution. The return distribution can provide more information than its expectation in conventional RL. Thus, distributional RL has been widely studied. However, very few previous works take full advantage of the learned distribution to improve distributional RL. This paper improves distributional RL by introducing epistemic and aleatoric uncertainty estimation. First, an epistemic and aleatoric uncertainty estimation method is introduced using deep ensembles and the learned value distribution. Next, we improve the exploration efficiency of fully parametrized quantile function (FQF) for distributional RL and obtain a FQF-U (uncertainty) algorithm. Then, to overcome the problem that distributional RL cannot operate over continuous control tasks, we propose an epistemic-uncertainty-based distributional soft actor-critic algorithm with an adaptive risk-averse and risk-seeking policy. Finally, experimental results show that our algorithms outperform the baselines in Atari games and Multi-joint dynamics with contact (MuJoCo) environments.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] The Aleatoric Uncertainty Estimation Using a Separate Formulation with Virtual Residuals
    Kawashima, Takumi
    Yu, Qina
    Asai, Akari
    Ikami, Daiki
    Aizawa, Kiyoharu
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1438 - 1445
  • [42] OFFLINE REINFORCEMENT LEARNING WITH POLICY GUIDANCE AND UNCERTAINTY ESTIMATION
    Wu, Lan
    Liu, Quan
    Zhang, Lihua
    Huang, Zhigang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5010 - 5014
  • [43] Aleatoric and epistemic uncertainty extraction of patient-specific deep learning-based dose predictions in LDR prostate brachytherapy
    Berumen, Francisco
    Ouellet, Samuel
    Enger, Shirin
    Beaulieu, Luc
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (08):
  • [44] Calibrated Aleatoric Uncertainty-Based Adaptive Label Distribution Learning for Pose Estimation of Sichuan Peppers
    Liu, Xueyin
    Dong, Dawei
    Luo, Jianqiao
    Li, Bailin
    IEEE SENSORS JOURNAL, 2024, 24 (07) : 10727 - 10741
  • [45] Epistemic uncertainty estimation with evidential learning on semantic segmentation of underwater images
    Do Nascimento, Gustavo Henrique
    Dias De Oliveira Evald, Paulo Jefferson
    Drews Junior, Paulo Lilles Jorge
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 163 - 168
  • [46] Reinforcement Learning using Reward Expectations in Scenarios with Aleatoric Uncertainties
    Wang, Yubin
    Sun, Yifeng
    Wu, Jiang
    Hu, Hao
    Wu, Zhiqiang
    Huang, Weigui
    2022 IEEE CONFERENCE ON GAMES, COG, 2022, : 261 - 267
  • [47] Graph neural network interatomic potential ensembles with calibrated aleatoric and epistemic uncertainty on energy and forces
    Busk, Jonas
    Schmidt, Mikkel N.
    Winther, Ole
    Vegge, Tejs
    Jorgensen, Peter Bjorn
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2023, 25 (37) : 25828 - 25837
  • [48] Probabilistic unifying relations for modelling epistemic and aleatoric uncertainty: Semantics and automated reasoning with theorem proving
    Ye, Kangfeng
    Woodcock, Jim
    Foster, Simon
    THEORETICAL COMPUTER SCIENCE, 2024, 1021
  • [49] Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
    Hu, Jifeng
    Sun, Yanchao
    Chen, Hechang
    Huang, Sili
    Piao, Haiyin
    Chang, Yi
    Sun, Lichao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] Epistemic and aleatoric uncertainty quantification for crack detection using a Bayesian Boundary Aware Convolutional Network
    Rathnakumar, Rahul
    Pang, Yutian
    Liu, Yongming
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2023, 240