From minimax value to low-regret algorithms for online Markov decision processes

被引:0
|
作者
Guan, Peng [1 ]
Raginsky, Maxim [1 ]
Willett, Rebecca [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.
引用
收藏
页码:471 / 476
页数:6
相关论文
共 50 条
  • [1] The Bayesian Prophet: A Low-Regret Framework for Online Decision Making
    Vera, Alberto
    Banerjee, Siddhartha
    MANAGEMENT SCIENCE, 2021, 67 (03) : 1368 - 1391
  • [2] The Bayesian Prophet: A Low-Regret Framework for Online Decision Making
    Vera A.
    Banerjee S.
    Performance Evaluation Review, 2019, 47 (01): : 81 - 82
  • [3] Dynamic Regret of Online Markov Decision Processes
    Zhao, Peng
    Li, Long-Fei
    Zhou, Zhi-Hua
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes
    Rigter, Marc
    Lacerda, Bruno
    Hawes, Nick
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11930 - 11938
  • [5] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
    Ortner, Ronald
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
  • [6] Online regret bounds for Markov decision processes with deterministic transitions
    Ortner, Ronald
    THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
  • [7] Simple Regret Optimization in Online Planning for Markov Decision Processes
    Feldman, Zohar
    Domshlak, Carmel
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 : 165 - 205
  • [8] Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
    Gollapudi, Sreenivas
    Guruganesh, Guru
    Kollias, Kostas
    Manurangsi, Pasin
    Leme, Renato Paes
    Schneider, Jon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
    Prabuchandran, K. J.
    Bodas, Tejas
    Tulabandhula, Theja
    AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
  • [10] On Markov policies for minimax decision processes
    Iwamoto, S
    Tsurusaki, K
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2001, 253 (01) : 58 - 78