From minimax value to low-regret algorithms for online Markov decision processes

被引：0

作者：

Guan, Peng ^{[1
]}

Raginsky, Maxim ^{[1
]}

Willett, Rebecca ^{[1
]}

机构：

[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

来源：

2014 AMERICAN CONTROL CONFERENCE (ACC) | 2014年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.

引用

页码：471 / 476

页数：6

共 50 条

[1] The Bayesian Prophet: A Low-Regret Framework for Online Decision Making
Vera, Alberto
Banerjee, Siddhartha
MANAGEMENT SCIENCE, 2021, 67 (03) : 1368 - 1391
[2] The Bayesian Prophet: A Low-Regret Framework for Online Decision Making
Vera A.
Banerjee S.
Performance Evaluation Review, 2019, 47 (01): : 81 - 82
[3] Dynamic Regret of Online Markov Decision Processes
Zhao, Peng
Li, Long-Fei
Zhou, Zhi-Hua
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[4] Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes
Rigter, Marc
Lacerda, Bruno
Hawes, Nick
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11930 - 11938
[5] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Ortner, Ronald
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
[6] Online regret bounds for Markov decision processes with deterministic transitions
Ortner, Ronald
THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
[7] Simple Regret Optimization in Online Planning for Markov Decision Processes
Feldman, Zohar
Domshlak, Carmel
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
[8] Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
Gollapudi, Sreenivas
Guruganesh, Guru
Kollias, Kostas
Manurangsi, Pasin
Leme, Renato Paes
Schneider, Jon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
Prabuchandran, K. J.
Bodas, Tejas
Tulabandhula, Theja
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
[10] On Markov policies for minimax decision processes
Iwamoto, S
Tsurusaki, K
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2001, 253 (01) : 58 - 78

← 1 2 3 4 5 →