Variational value learning in advantage actor-critic reinforcement learning

被引：0

作者：

Zhang, Yaozhong ^{[1
]}

Han, Jiaqi ^{[2
]}

Hu, Xiaofang ^{[3
]}

Dan, Shihao ^{[1
]}

机构：

[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China

[2] Southwest Univ, Sch Hanhong, Chongqing, Peoples R China

[3] Southwest Univ, Coll Artificial Intelligence, Chongqing, Peoples R China

来源：

2020 CHINESE AUTOMATION CONGRESS (CAC 2020) | 2020年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Bayesian Neural Nerivorks; variational inference; reinfiircement learning; NETWORKS;

D O I：

10.1109/CAC51589.2020.9324530

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The performance and applications of reinforcement learning based on artificial neural networks (ANNs) have been limited by the overfitting problems of ANNs. With probability distributional weights and the variational inference technique, Bayesian neural networks (BNNs) can reduce the overfitting of data and thus improve the generalization ability of models. This paper proposes an advantage actor-variational-critic reinforcement learning algorithm (called A2VC) based on advantage actor-critic reinforcement learning (A2C). We model the value functions as a probability distribution and implement the distribution by the critic BNN. Based on the variational inference technique and the reparameterization trick, the weights of the critic BNN are well optimized. On the other hand, weights of the actor ANN are optimized with the stochastic policy gradient. Simulations in the lunar lander and cart-pole environments show the effectiveness and advantages of the proposed scheme over conventional A2C algorithm on the learning and decision-making capacity of agents.

引用

页码：1955 / 1960

页数：6

共 50 条

[31] Enhancing cotton irrigation with distributional actor-critic reinforcement learning
Chen, Yi
Lin, Meiwei
Yu, Zhuo
Sun, Weihong
Fu, Weiguo
He, Liang
AGRICULTURAL WATER MANAGEMENT, 2025, 307
[32] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Zanette, Andrea
Wainwright, Martin J.
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[33] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
[34] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[35] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Fan, Zhou
Su, Rui
Zhang, Weinan
Yu, Yong
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
[36] Actor-critic reinforcement learning for the feedback control of a swinging chain
Dengler, C.
Lohmann, B.
IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383
[37] A Prioritized objective actor-critic method for deep reinforcement learning
Ngoc Duy Nguyen
Thanh Thi Nguyen
Peter Vamplew
Richard Dazeley
Saeid Nahavandi
Neural Computing and Applications, 2021, 33 : 10335 - 10349
[38] A Prioritized objective actor-critic method for deep reinforcement learning
Nguyen, Ngoc Duy
Nguyen, Thanh Thi
Vamplew, Peter
Dazeley, Richard
Nahavandi, Saeid
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
[39] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
Kim, Youngjae
Hussain, Manzoor
Suh, Jae-Won
Hong, Jang-Eui
2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
[40] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
Yue, Wangyang
Zhou, Yuan
Zhang, Xiaochuan
Hua, Yuchen
Li, Minne
Fan, Zunlin
Wang, Zhiyuan
Kou, Guang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339

← 1 2 3 4 5 →