From minimax value to low-regret algorithms for online Markov decision processes

被引:0
|
作者
Guan, Peng [1 ]
Raginsky, Maxim [1 ]
Willett, Rebecca [1 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The standard Markov Decision Process (MDP) framework assumes a stationary (or at least predictable) environment. Online learning algorithms can deal with non-stationary or unpredictable environments, but there is no notion of a state that might be changing throughout the learning process as a function of past actions. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting, where the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would design an algorithm from scratch and analyze its performance on a case-by-case basis. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper builds on recent results of Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones.
引用
收藏
页码:471 / 476
页数:6
相关论文
共 50 条
  • [41] Learning algorithms or Markov decision processes with average cost
    Abounadi, J
    Bertsekas, D
    Borkar, VS
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2001, 40 (03) : 681 - 698
  • [42] Combining Learning Algorithms: An Approach to Markov Decision Processes
    Ribeiro, Richardson
    Favarim, Fabio
    Barbosa, Marco A. C.
    Koerich, Alessandro L.
    Enembreck, Fabricio
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2012, 2013, 141 : 172 - 188
  • [43] Hierarchical algorithms for discounted and weighted Markov decision processes
    M. Abbad
    C. Daoui
    Mathematical Methods of Operations Research, 2003, 58 : 237 - 245
  • [44] IMED-RL: Regret optimal learning of ergodic Markov decision processes
    Pesquerel, Fabien
    Maillard, Odalric-Ambrym
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [45] Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
    Ahmed, Asrar
    Varakantham, Pradeep
    Lowalekar, Meghna
    Adulyasak, Yossiri
    Jaillet, Patrick
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2017, 59 : 229 - 264
  • [46] Online Learning of Safety function for Markov Decision Processes
    Mazumdar, Abhijit
    Wisniewski, Rafal
    Bujorianu, Manuela L.
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [47] Online Convex Optimization in Adversarial Markov Decision Processes
    Rosenberg, Aviv
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [48] Online Markov Decision Processes Under Bandit Feedback
    Neu, Gergely
    Gyoergy, Andras
    Szepesvari, Csaba
    Antos, Andras
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 676 - 691
  • [49] Online Learning in Markov Decision Processes with Continuous Actions
    Hong, Yi-Te
    Lu, Chi-Jen
    ALGORITHMIC LEARNING THEORY, ALT 2015, 2015, 9355 : 302 - 316
  • [50] Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL
    Zhang, Weitong
    He, Jiafan
    Zhou, Dongruo
    Zhang, Amy
    Gu, Quanquan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2488 - 2497