Variational value learning in advantage actor-critic reinforcement learning

被引:0
|
作者
Zhang, Yaozhong [1 ]
Han, Jiaqi [2 ]
Hu, Xiaofang [3 ]
Dan, Shihao [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China
[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;
D O I
10.1109/CAC51589.2020.9324530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.
引用
收藏
页码:1955 / 1960
页数:6
相关论文
共 50 条
  • [31] Enhancing cotton irrigation with distributional actor-critic reinforcement learning
    Chen, Yi
    Lin, Meiwei
    Yu, Zhuo
    Sun, Weihong
    Fu, Weiguo
    He, Liang
    AGRICULTURAL WATER MANAGEMENT, 2025, 307
  • [32] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
    Zanette, Andrea
    Wainwright, Martin J.
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [34] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [35] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
    Fan, Zhou
    Su, Rui
    Zhang, Weinan
    Yu, Yong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
  • [36] Actor-critic reinforcement learning for the feedback control of a swinging chain
    Dengler, C.
    Lohmann, B.
    IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383
  • [37] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [38] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
  • [39] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
    Kim, Youngjae
    Hussain, Manzoor
    Suh, Jae-Won
    Hong, Jang-Eui
    2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
  • [40] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
    Yue, Wangyang
    Zhou, Yuan
    Zhang, Xiaochuan
    Hua, Yuchen
    Li, Minne
    Fan, Zunlin
    Wang, Zhiyuan
    Kou, Guang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339