Online Learning in Kernelized Markov Decision Processes

被引:0
|
作者
Chowdhury, Sayak Ray [1 ]
Gopalan, Aditya [1 ]
机构
[1] Indian Inst Sci, Bangalore 560012, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Bayesian Learning of Noisy Markov Decision Processes
    Singh, Sumeetpal S.
    Chopin, Nicolas
    Whiteley, Nick
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2013, 23 (01):
  • [22] Learning Factored Markov Decision Processes with Unawareness
    Innes, Craig
    Lascarides, Alex
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2030 - 2032
  • [23] HIERARCHICAL REPRESENTATION LEARNING FOR MARKOV DECISION PROCESSES
    Steccanella, Lorenzo
    Jonsson, Anders
    Totaro, Simone
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 568 - 585
  • [24] Episodic task learning in Markov decision processes
    Yong Lin
    Fillia Makedon
    Yurong Xu
    Artificial Intelligence Review, 2011, 36 : 87 - 98
  • [25] Robust Anytime Learning of Markov Decision Processes
    Suilen, Marnix
    Simao, Thiago D.
    Parker, David
    Jansen, Nils
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Learning Markov Decision Processes for Model Checking
    Mao, Hua
    Chen, Yingke
    Jaeger, Manfred
    Nielsen, Thomas D.
    Larsen, Kim G.
    Nielsen, Brian
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2012, (103): : 49 - 63
  • [27] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [28] Episodic task learning in Markov decision processes
    Lin, Yong
    Makedon, Fillia
    Xu, Yurong
    ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 87 - 98
  • [29] LEARNING ALGORITHMS FOR MARKOV DECISION-PROCESSES
    KURANO, M
    JOURNAL OF APPLIED PROBABILITY, 1987, 24 (01) : 270 - 276
  • [30] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353