Online Learning in Kernelized Markov Decision Processes

被引：0

作者：

Chowdhury, Sayak Ray ^{[1
]}

Gopalan, Aditya ^{[1
]}

机构：

[1] Indian Inst Sci, Bangalore 560012, Karnataka, India

来源：

22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89 | 2019年 / 89卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.

引用

页数：9

共 50 条

[1] Blackwell Online Learning for Markov Decision Processes
Li, Tao
Peng, Guanze
Zhu, Quanyan
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[2] Online Learning of Safety function for Markov Decision Processes
Mazumdar, Abhijit
Wisniewski, Rafal
Bujorianu, Manuela L.
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[3] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
[4] Online Markov Decision Processes
Even-Dar, Eyal
Kakade, Sham M.
Mansour, Yishay
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
[5] Kernelized Q-Learning for Large-Scale, Potentially Continuous, Markov Decision Processes
Sledge, Isaac J.
Principe, Jose C.
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 153 - 162
[6] Online Learning in Markov Decision Processes with Changing Cost Sequences
Dick, Travis
Gyorgy, Andras
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[7] Online Learning with Implicit Exploration in Episodic Markov Decision Processes
Ghasemi, Mahsa
Hashemi, Abolfazl
Vikalo, Haris
Topcu, Ufuk
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1953 - 1958
[8] Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
Yu, Jia Yuan
Mannor, Shie
2009 INTERNATIONAL CONFERENCE ON GAME THEORY FOR NETWORKS (GAMENETS 2009), 2009, : 314 - 322
[9] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
[10] A Structure-aware Online Learning Algorithm for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 71 - 78

← 1 2 3 4 5 →