Variational value learning in advantage actor-critic reinforcement learning

被引:0
|
作者
Zhang, Yaozhong [1 ]
Han, Jiaqi [2 ]
Hu, Xiaofang [3 ]
Dan, Shihao [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China
[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;
D O I
10.1109/CAC51589.2020.9324530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.
引用
收藏
页码:1955 / 1960
页数:6
相关论文
共 50 条
  • [1] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [2] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
  • [3] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [5] A fuzzy Actor-Critic reinforcement learning network
    Wang, Xue-Song
    Cheng, Yu-Hu
    Yi, Jian-Qiang
    INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
  • [6] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609
  • [7] Research on actor-critic reinforcement learning in RoboCup
    Guo, He
    Liu, Tianying
    Wang, Yuxin
    Chen, Feng
    Fan, Jianming
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205
  • [8] Reinforcement actor-critic learning as a rehearsal in MicroRTS
    Manandhar, Shiron
    Banerjee, Bikramjit
    KNOWLEDGE ENGINEERING REVIEW, 2024, 39
  • [9] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    INFORMATION SCIENCES, 2023, 647
  • [10] Locating algorithm of steel stock area with asynchronous advantage actor-critic reinforcement learning
    Cho, Young-in
    Kim, Byeongseop
    Yoon, Hee-Chang
    Woo, Jong Hun
    JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2024, 11 (01) : 230 - 246