Online Learning in Kernelized Markov Decision Processes

被引：0

作者：

Chowdhury, Sayak Ray ^{[1
]}

Gopalan, Aditya ^{[1
]}

机构：

[1] Indian Inst Sci, Bangalore 560012, Karnataka, India

来源：

22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89 | 2019年 / 89卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.

引用

页数：9

共 50 条

[31] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Ortner, Ronald
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
[32] Online regret bounds for Markov decision processes with deterministic transitions
Ortner, Ronald
THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
[33] Online Planning for Large Markov Decision Processes with Hierarchical Decomposition
Bai, Aijun
Wu, Feng
Chen, Xiaoping
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (04)
[34] Simple Regret Optimization in Online Planning for Markov Decision Processes
Feldman, Zohar
Domshlak, Carmel
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
[35] Learning Policies for Markov Decision Processes in Continuous Spaces
Paternain, Santiago
Bazerque, Juan Andres
Small, Austin
Ribeiro, Alejandro
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
[36] Active Learning of Markov Decision Processes for System Verification
Chen, Yingke
Nielsen, Thomas Dyhre
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 289 - 294
[37] Active learning in partially observable Markov decision processes
Jaulmes, R
Pineau, J
Precup, D
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
[38] Learning Policies for Markov Decision Processes From Data
Hanawal, Manjesh Kumar
Liu, Hao
Zhu, Henghui
Paschalidis, Ioannis Ch.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
[39] Concurrent Markov decision processes for robot team learning
Girard, Justin
Emami, M. Reza
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 : 223 - 234
[40] Learning Adversarial Markov Decision Processes with Delayed Feedback
Lancewicki, Tal
Rosenberg, Aviv
Mansour, Yishay
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7281 - 7289

← 1 2 3 4 5 →