Online Reinforcement Learning by Bayesian Inference

被引:0
|
作者
Xia, Zhongpu [1 ]
Zhao, Dongbin [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
关键词
GAUSSIAN-PROCESSES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy evaluation has long been one of the core issues of the online reinforcement learning, especially in the continuous state domain. In this paper, the issue is addressed by employing Gaussian processes to represent the action value function from the probability perspective. By modeling the return as a stochastic variable, the action value function can sequentially update according to observed variables such as state and reward by Bayesian inference during the policy evaluation. The update rule shows that it is a temporal difference learning method with the learning rate determined by the uncertainty of a collected sample. Incorporating the policy evaluation method with the E-greedy action selection method, we propose an online reinforcement learning algorithm referred as to Bayesian-SARSA. It is tested on some benchmark problems and the empirical results verifies its effectiveness.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Benchmarking for Bayesian Reinforcement Learning
    Castronovo, Michael
    Ernst, Damien
    Couetoux, Adrien
    Fonteneau, Raphael
    PLOS ONE, 2016, 11 (06):
  • [22] Bayesian Inverse Reinforcement Learning
    Ramachandran, Deepak
    Amir, Eyal
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2586 - 2591
  • [23] Bayesian reinforcement learning: A survey
    Ghavamzadeh, Mohammad
    Mannor, Shie
    Pineau, Joelle
    Tamar, Aviv
    Foundations and Trends in Machine Learning, 2015, 8 (5-6): : 359 - 483
  • [24] Bayesian Reinforcement Learning with Exploration
    Lattimore, Tor
    Hutter, Marcus
    ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
  • [25] Online Bayesian inference and learning of Gaussian-process state-space models
    Berntorp, Karl
    AUTOMATICA, 2021, 129
  • [26] Online Bayesian inference for the parameters of PRISM programs
    Cussens, James
    MACHINE LEARNING, 2012, 89 (03) : 279 - 297
  • [27] Efficient Online Bayesian Inference for Neural Bandits
    Duran-Martin, Gerardo
    Kara, Aleyna
    Murphy, Kevin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 6002 - 6021
  • [28] Neuronal Sequence Models for Bayesian Online Inference
    Froelich, Sascha
    Markovic, Dimitrije
    Kiebel, Stefan J.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [29] Bayesian inference - The future of online fraud protection
    Excell, D., 1600, Elsevier Ltd (2012):
  • [30] Online Bayesian inference for the parameters of PRISM programs
    James Cussens
    Machine Learning, 2012, 89 : 279 - 297