From minimax value to low-regret algorithms for online Markov decision processes

被引:0
|
作者
Guan, Peng [1 ]
Raginsky, Maxim [1 ]
Willett, Rebecca [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.
引用
收藏
页码:471 / 476
页数:6
相关论文
共 50 条
  • [31] The empirical Bayes envelope and regret minimization in competitive Markov decision processes
    Mannor, S
    Shimkin, N
    MATHEMATICS OF OPERATIONS RESEARCH, 2003, 28 (02) : 327 - 345
  • [32] Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
    Tian, Yi
    Qian, Jian
    Sra, Suvrit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Online Markov Decision Processes Configuration with Continuous Decision Space
    Maran, Davide
    Olivieri, Pierriccardo
    Stradi, Francesco Emanuele
    Urso, Giuseppe
    Gatti, Nicola
    Restelli, Marcello
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14315 - 14322
  • [34] Value set iteration for Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2014, 50 (07) : 1940 - 1943
  • [35] Continuity of the value of competitive Markov decision processes
    Solan, E
    JOURNAL OF THEORETICAL PROBABILITY, 2003, 16 (04) : 831 - 845
  • [36] Continuity of the Value of Competitive Markov Decision Processes
    Eilon Solan
    Journal of Theoretical Probability, 2003, 16 : 831 - 845
  • [37] Verification of Markov Decision Processes Using Learning Algorithms
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Chmelik, Martin
    Forejt, Vojtech
    Kretinsky, Jan
    Kwiatkowska, Marta
    Parker, David
    Ujma, Mateusz
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2014, 2014, 8837 : 98 - 114
  • [38] Hierarchical algorithms for discounted and weighted Markov decision processes
    Abbad, M
    Daoui, C
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2003, 58 (02) : 237 - 245
  • [39] On some algorithms for limiting average Markov decision processes
    Daoui, C.
    Abbad, M.
    OPERATIONS RESEARCH LETTERS, 2007, 35 (02) : 261 - 266
  • [40] Improved Algorithms for Misspecified Linear Markov Decision Processes
    Vial, Daniel
    Parulekar, Advait
    Shakkottai, Sanjay
    Srikant, R.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151