From minimax value to low-regret algorithms for online Markov decision processes

被引：0

作者：

Guan, Peng ^{[1
]}

Raginsky, Maxim ^{[1
]}

Willett, Rebecca ^{[1
]}

机构：

[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

来源：

2014 AMERICAN CONTROL CONFERENCE (ACC) | 2014年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.

引用

页码：471 / 476

页数：6

共 50 条

[31] The empirical Bayes envelope and regret minimization in competitive Markov decision processes
Mannor, S
Shimkin, N
MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (02) : 327 - 345
[32] Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Tian, Yi
Qian, Jian
Sra, Suvrit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[33] Online Markov Decision Processes Configuration with Continuous Decision Space
Maran, Davide
Olivieri, Pierriccardo
Stradi, Francesco Emanuele
Urso, Giuseppe
Gatti, Nicola
Restelli, Marcello
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14315 - 14322
[34] Value set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[35] Continuity of the value of competitive Markov decision processes
Solan, E
JOURNAL OF THEORETICAL PROBABILITY, 2003, 16 (04) : 831 - 845
[36] Continuity of the Value of Competitive Markov Decision Processes
Eilon Solan
Journal of Theoretical Probability, 2003, 16 : 831 - 845
[37] Verification of Markov Decision Processes Using Learning Algorithms
Brazdil, Tomas
Chatterjee, Krishnendu
Chmelik, Martin
Forejt, Vojtech
Kretinsky, Jan
Kwiatkowska, Marta
Parker, David
Ujma, Mateusz
AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
[38] Hierarchical algorithms for discounted and weighted Markov decision processes
Abbad, M
Daoui, C
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2003, 58 (02) : 237 - 245
[39] On some algorithms for limiting average Markov decision processes
Daoui, C.
Abbad, M.
OPERATIONS RESEARCH LETTERS, 2007, 35 (02) : 261 - 266
[40] Improved Algorithms for Misspecified Linear Markov Decision Processes
Vial, Daniel
Parulekar, Advait
Shakkottai, Sanjay
Srikant, R.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151

← 1 2 3 4 5 →