Online Learning in Kernelized Markov Decision Processes

被引:0
|
作者
Chowdhury, Sayak Ray [1 ]
Gopalan, Aditya [1 ]
机构
[1] Indian Inst Sci, Bangalore 560012, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
    Ortner, Ronald
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
  • [32] Online regret bounds for Markov decision processes with deterministic transitions
    Ortner, Ronald
    THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
  • [33] Online Planning for Large Markov Decision Processes with Hierarchical Decomposition
    Bai, Aijun
    Wu, Feng
    Chen, Xiaoping
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (04)
  • [34] Simple Regret Optimization in Online Planning for Markov Decision Processes
    Feldman, Zohar
    Domshlak, Carmel
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
  • [35] Learning Policies for Markov Decision Processes in Continuous Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
  • [36] Active Learning of Markov Decision Processes for System Verification
    Chen, Yingke
    Nielsen, Thomas Dyhre
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 289 - 294
  • [37] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [38] Learning Policies for Markov Decision Processes From Data
    Hanawal, Manjesh Kumar
    Liu, Hao
    Zhu, Henghui
    Paschalidis, Ioannis Ch.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
  • [39] Concurrent Markov decision processes for robot team learning
    Girard, Justin
    Emami, M. Reza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
  • [40] Learning Adversarial Markov Decision Processes with Delayed Feedback
    Lancewicki, Tal
    Rosenberg, Aviv
    Mansour, Yishay
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289