Online Learning in Kernelized Markov Decision Processes

被引:0
|
作者
Chowdhury, Sayak Ray [1 ]
Gopalan, Aditya [1 ]
机构
[1] Indian Inst Sci, Bangalore 560012, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [42] Verification of Markov Decision Processes Using Learning Algorithms
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Forejt, Vojtech
    Kretinsky, Jan
    Kwiatkowska, Marta
    Parker, David
    Ujma, Mateusz
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
  • [43] Recursive learning automata approach to Markov decision processes
    Chang, Hyeong Soo
    Fu, Michael C.
    Hu, Jiaqiao
    Marcus, Steven I.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (07) : 1349 - 1355
  • [44] Learning and Planning with Timing Information in Markov Decision Processes
    Bacon, Pierre-Luc
    Balle, Borja
    Precup, Doina
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 111 - 120
  • [45] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [46] Combining Learning Algorithms: An Approach to Markov Decision Processes
    Ribeiro, Richardson
    Favarim, Fabio
    Barbosa, Marco A. C.
    Koerich, Alessandro L.
    Enembreck, Fabricio
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
  • [47] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [48] Online reinforcement learning for condition-based group maintenance using factored Markov decision processes
    Xu, Jianyu
    Liu, Bin
    Zhao, Xiujie
    Wang, Xiao-Lin
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 315 (01) : 176 - 190
  • [49] Kernelized Online Imbalanced Learning with Fixed Budgets
    Hu, Junjie
    Yang, Haiqin
    King, Irwin
    Lyu, Michael R.
    So, Anthony Man-Cho
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2666 - 2672
  • [50] Parallel rollout for online solution of partially observable Markov decision processes
    Chang, HS
    Givan, R
    Chong, EKP
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2004, 14 (03): : 309 - 341