Variational value learning in advantage actor-critic reinforcement learning

被引:0
|
作者
Zhang, Yaozhong [1 ]
Han, Jiaqi [2 ]
Hu, Xiaofang [3 ]
Dan, Shihao [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China
[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;
D O I
10.1109/CAC51589.2020.9324530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.
引用
收藏
页码:1955 / 1960
页数:6
相关论文
共 50 条
  • [11] Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
    Xiao, Yuchen
    Lyu, Xueguang
    Amato, Christopher
    2021 INTERNATIONAL SYMPOSIUM ON MULTI-ROBOT AND MULTI-AGENT SYSTEMS (MRS), 2021, : 155 - 163
  • [12] Actor-Critic reinforcement learning based on prior knowledge
    Yang, Zhenyu, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18):
  • [13] An Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Stock Selection and Portfolio Management
    Kang, Qinma
    Zhou, Huizhuo
    Kang, Yunfan
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 141 - 145
  • [14] Actor-Critic Reinforcement Learning for Tracking Control in Robotics
    Pane, Yudha P.
    Nageshrao, Subramanya P.
    Babuska, Robert
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 5819 - 5826
  • [15] Visual Navigation with Actor-Critic Deep Reinforcement Learning
    Shao, Kun
    Zhao, Dongbin
    Zhu, Yuanheng
    Zhang, Qichao
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [16] Reinforcement learning with actor-critic for knowledge graph reasoning
    Zhang, Linli
    Li, Dewei
    Xi, Yugeng
    Jia, Shuai
    SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (06)
  • [17] Actor-critic reinforcement learning for bidding in bilateral negotiation
    Arslan, Furkan
    Aydogan, Reyhan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (05) : 1695 - 1714
  • [18] Reinforcement learning with actor-critic for knowledge graph reasoning
    Linli Zhang
    Dewei Li
    Yugeng Xi
    Shuai Jia
    Science China Information Sciences, 2020, 63
  • [19] A Sandpile Model for Reliable Actor-Critic Reinforcement Learning
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    Pang, Shaoning
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4014 - 4021
  • [20] Reinforcement learning with actor-critic for knowledge graph reasoning
    Linli ZHANG
    Dewei LI
    Yugeng XI
    Shuai JIA
    Science China(Information Sciences), 2020, 63 (06) : 223 - 225