Variational value learning in advantage actor-critic reinforcement learning

被引:0
|
作者
Zhang, Yaozhong [1 ]
Han, Jiaqi [2 ]
Hu, Xiaofang [3 ]
Dan, Shihao [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China
[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;
D O I
10.1109/CAC51589.2020.9324530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.
引用
收藏
页码:1955 / 1960
页数:6
相关论文
共 50 条
  • [21] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
    Han, Minghao
    Zhang, Lixian
    Wang, Jun
    Pan, Wei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
  • [22] Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
    Wu, Yue
    Zhai, Shuangfei
    Srivastava, Nitish
    Susskind, Joshua
    Zhang, Jian
    Salakhutdinov, Ruslan
    Goh, Hanlin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [23] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [24] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
    Baheri, Betis
    Tronge, Jacob
    Fang, Bo
    Li, Ang
    Chaudhary, Vipin
    Guan, Qiang
    2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
  • [25] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    COMPLEXITY, 2021, 2021
  • [26] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [27] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
    Kim, Namyong
    Shin, IIayong
    2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
  • [28] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [29] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
    Xiao, Yuchen
    Tan, Weihao
    Amato, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
    Liang, Kun
    Zhang, Guoqiang
    Guo, Jinhui
    Li, Wentao
    ELECTRONICS, 2023, 12 (24)