Online Learning in Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

被引：19

作者：

Yu, Jia Yuan ^{[1
]}

Mannor, Shie ^{[1
]}

机构：

[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2T5, Canada

来源：

2009 INTERNATIONAL CONFERENCE ON GAME THEORY FOR NETWORKS (GAMENETS 2009) | 2009年

关键词：

D O I：

10.1109/GAMENETS.2009.5137416

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies-i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.

引用

页码：314 / 322

页数：9

共 50 条

[1] Online Learning in Markov Decision Processes with Changing Cost Sequences
Dick, Travis
Gyorgy, Andras
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[2] Large Scale Markov Decision Processes with Changing Rewards
Cardoso, Adrian Rivera
Wang, He
Xu, Huan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Blackwell Online Learning for Markov Decision Processes
Li, Tao
Peng, Guanze
Zhu, Quanyan
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[4] Online Learning in Kernelized Markov Decision Processes
Chowdhury, Sayak Ray
Gopalan, Aditya
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[5] Arbitrarily Modulated Markov Decision Processes
Yu, Jia Yuan
Mannor, Shie
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 2946 - 2953
[6] Markov Decision Processes with Functional Rewards
Spanjaard, Olivier
Weng, Paul
MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2013, 8271 : 269 - 280
[7] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Ortner, Ronald
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
[8] Online regret bounds for Markov decision processes with deterministic transitions
Ortner, Ronald
THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
[9] Online Learning of Safety function for Markov Decision Processes
Mazumdar, Abhijit
Wisniewski, Rafal
Bujorianu, Manuela L.
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[10] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316

← 1 2 3 4 5 →