From minimax value to low-regret algorithms for online Markov decision processes

被引：0

作者：

Guan, Peng ^{[1
]}

Raginsky, Maxim ^{[1
]}

Willett, Rebecca ^{[1
]}

机构：

[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

来源：

2014 AMERICAN CONTROL CONFERENCE (ACC) | 2014年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.

引用

页码：471 / 476

页数：6

共 50 条

[41] Learning algorithms or Markov decision processes with average cost
Abounadi, J
Bertsekas, D
Borkar, VS
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
[42] Combining Learning Algorithms: An Approach to Markov Decision Processes
Ribeiro, Richardson
Favarim, Fabio
Barbosa, Marco A. C.
Koerich, Alessandro L.
Enembreck, Fabricio
ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
[43] Hierarchical algorithms for discounted and weighted Markov decision processes
M. Abbad
C. Daoui
Mathematical Methods of Operations Research, 2003, 58 : 237 - 245
[44] IMED-RL: Regret optimal learning of ergodic Markov decision processes
Pesquerel, Fabien
Maillard, Odalric-Ambrym
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[45] Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
Ahmed, Asrar
Varakantham, Pradeep
Lowalekar, Meghna
Adulyasak, Yossiri
Jaillet, Patrick
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2017, 59 : 229 - 264
[46] Online Learning of Safety function for Markov Decision Processes
Mazumdar, Abhijit
Wisniewski, Rafal
Bujorianu, Manuela L.
2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
[47] Online Convex Optimization in Adversarial Markov Decision Processes
Rosenberg, Aviv
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[48] Online Markov Decision Processes Under Bandit Feedback
Neu, Gergely
Gyoergy, Andras
Szepesvari, Csaba
Antos, Andras
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 676 - 691
[49] Online Learning in Markov Decision Processes with Continuous Actions
Hong, Yi-Te
Lu, Chi-Jen
ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
[50] Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL
Zhang, Weitong
He, Jiafan
Zhou, Dongruo
Zhang, Amy
Gu, Quanquan
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2488 - 2497

← 1 2 3 4 5 →