Distributional Deep Reinforcement Learning with a Mixture of Gaussians

被引:0
|
作者
Choi, Yunho [1 ,2 ]
Lee, Kyungjae [1 ,2 ]
Oh, Songhwai [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, ASRI, Seoul 08826, South Korea
来源
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/icra.2019.8793505
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel distributional reinforcement learning (RL) method which models the distribution of the sum of rewards using a mixture density network. Recently, it has been shown that modeling the randomness of the return distribution leads to better performance in Atari games and control tasks. Despite the success of the prior work, it has limitations which come from the use of a discrete distribution. First, it needs a projection step and softmax parametrization for the distribution, since it minimizes the KL divergence loss. Secondly, its performance depends on discretization hyperparameters such as the number of atoms and bounds of the support which require domain knowledge. We mitigate these problems with the proposed parameterization, a mixture of Gaussians. Furthermore, we propose a new distance metric called the Jensen-Tsallis distance, which allows the computation of the distance between two mixtures of Gaussians in a closed form. We have conducted various experiments to validate the proposed method, including Atari games and autonomous vehicle driving.
引用
收藏
页码:9791 / 9797
页数:7
相关论文
共 50 条
  • [1] A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning
    Huang, Liwei
    Fu, Mingsheng
    Rao, Ananya
    Irissappane, Athirai A.
    Zhang, Jie
    Xu, Chengzhong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 4246 - 4259
  • [2] Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization
    Li, Siyao
    Lei, Deren
    Qin, Pengda
    Wang, William Yang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6038 - 6044
  • [3] Gamma and vega hedging using deep distributional reinforcement learning
    Cao, Jay
    Chen, Jacky
    Farghadani, Soroush
    Hull, John
    Poulos, Zissis
    Wang, Zeyu
    Yuan, Jun
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [4] Learning Mixture of Gaussians with Streaming Data
    Raghunathan, Aditi
    Jain, Prateek
    Krishnaswamy, Ravishankar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [5] Distributional Deep Reinforcement Learning-Based Emergency Frequency Control
    Xie, Jian
    Sun, Wei
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2022, 37 (04) : 2720 - 2730
  • [6] Parallel Distributional Prioritized Deep Reinforcement Learning for Unmanned Aerial Vehicles
    Kolling, Alisson Henrique
    Kich, Victor Augusto
    de Jesus, Junior Costa
    da Silva, Andressa Cavalcante
    Grando, Ricardo Bedin
    Jorge Drews-, Paulo Lilles, Jr.
    Gamarra, Daniel F. T.
    2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 95 - 100
  • [7] Tight Bounds for Learning a Mixture of Two Gaussians
    Hardt, Moritz
    Price, Eric
    STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 753 - 760
  • [8] Improved Algorithms for Properly Learning Mixture of Gaussians
    Wu, Xuan
    Xie, Changzhi
    THEORETICAL COMPUTER SCIENCE (NCTCS 2018), 2018, 882 : 8 - 26
  • [9] Non-liner Learning for Mixture of Gaussians
    Lin, Chih-Yang
    Liu, Pin-Hsian
    Muindisi, Tatenda
    Yeh, Chia-Hung
    Su, Po-Chyi
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [10] Distributional Reinforcement Learning with Ensembles
    Lindenberg, Bjorn
    Nordqvist, Jonas
    Lindahl, Karl-Olof
    ALGORITHMS, 2020, 13 (05)