Online Reinforcement Learning by Bayesian Inference

被引：0

作者：

Xia, Zhongpu ^{[1
]}

Zhao, Dongbin ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

来源：

2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2015年

关键词：

GAUSSIAN-PROCESSES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy evaluation has long been one of the core issues of the online reinforcement learning, especially in the continuous state domain. In this paper, the issue is addressed by employing Gaussian processes to represent the action value function from the probability perspective. By modeling the return as a stochastic variable, the action value function can sequentially update according to observed variables such as state and reward by Bayesian inference during the policy evaluation. The update rule shows that it is a temporal difference learning method with the learning rate determined by the uncertainty of a collected sample. Incorporating the policy evaluation method with the E-greedy action selection method, we propose an online reinforcement learning algorithm referred as to Bayesian-SARSA. It is tested on some benchmark problems and the empirical results verifies its effectiveness.

引用

页数：6

共 50 条

[21] Benchmarking for Bayesian Reinforcement Learning
Castronovo, Michael
Ernst, Damien
Couetoux, Adrien
Fonteneau, Raphael
PLOS ONE, 2016, 11 (06):
[22] Bayesian Inverse Reinforcement Learning
Ramachandran, Deepak
Amir, Eyal
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2586 - 2591
[23] Bayesian reinforcement learning: A survey
Ghavamzadeh, Mohammad
Mannor, Shie
Pineau, Joelle
Tamar, Aviv
Foundations and Trends in Machine Learning, 2015, 8 (5-6): : 359 - 483
[24] Bayesian Reinforcement Learning with Exploration
Lattimore, Tor
Hutter, Marcus
ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
[25] Online Bayesian inference and learning of Gaussian-process state-space models
Berntorp, Karl
AUTOMATICA, 2021, 129
[26] Online Bayesian inference for the parameters of PRISM programs
Cussens, James
MACHINE LEARNING, 2012, 89 (03) : 279 - 297
[27] Efficient Online Bayesian Inference for Neural Bandits
Duran-Martin, Gerardo
Kara, Aleyna
Murphy, Kevin
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 6002 - 6021
[28] Neuronal Sequence Models for Bayesian Online Inference
Froelich, Sascha
Markovic, Dimitrije
Kiebel, Stefan J.
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[29] Bayesian inference - The future of online fraud protection
Excell, D., 1600, Elsevier Ltd (2012):
[30] Online Bayesian inference for the parameters of PRISM programs
James Cussens
Machine Learning, 2012, 89 : 279 - 297

← 1 2 3 4 5 →