Variational value learning in advantage actor-critic reinforcement learning

被引:0
|
作者
Zhang, Yaozhong [1 ]
Han, Jiaqi [2 ]
Hu, Xiaofang [3 ]
Dan, Shihao [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China
[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;
D O I
10.1109/CAC51589.2020.9324530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.
引用
收藏
页码:1955 / 1960
页数:6
相关论文
共 50 条
  • [41] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
    Zhou, Ruida
    Liu, Tao
    Cheng, Min
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
    Lee, Junseo
    Heo, Jaeseok
    Kim, Dohyeong
    Lee, Gunmin
    Oh, Songhwai
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7568 - 7573
  • [43] Dynamic Charging Scheme Problem With Actor-Critic Reinforcement Learning
    Yang, Meiyi
    Liu, Nianbo
    Zuo, Lin
    Feng, Yong
    Liu, Minghui
    Gong, Haigang
    Liu, Ming
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) : 370 - 380
  • [44] Automated State Feature Learning for Actor-Critic Reinforcement Learning through NEAT
    Peng, Yiming
    Chen, Gang
    Holdaway, Scott
    Mei, Yi
    Zhang, Mengjie
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCO'17 COMPANION), 2017, : 135 - 136
  • [45] A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning
    Zhu, Xing
    Du, Yunfei
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 320 - 327
  • [46] Cooperative Advantage Actor-Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs
    Meng, Yizhen
    Liu, Chun
    Wang, Qiang
    Tan, Longyu
    IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 6509 - 6523
  • [47] Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
    Chen, Jingdi
    Wang, Yimeng
    Lan, Tian
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [48] An extension of Genetic Network Programming with Reinforcement Learning using actor-critic
    Hatakeyama, Hiroyuki
    Mabu, Shingo
    Hirasawa, Kotaro
    Hu, Jinglu
    2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 1522 - +
  • [49] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
    Grondman, Ivo
    Busoniu, Lucian
    Lopes, Gabriel A. D.
    Babuska, Robert
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1291 - 1307
  • [50] Exponential TD Learning: A Risk-Sensitive Actor-Critic Reinforcement Learning Algorithm
    Noorani, Erfaun
    Mavridis, Christos N.
    Baras, John S.
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4104 - 4109