Regret Bounds for Reinforcement Learning via Markov Chain Concentration

被引:0
|
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Dept Math & Informat Technol, Franz Josef Str 18, A-8700 Leoben, Austria
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We give a simple optimistic algorithm for which it is easy to derive regret bounds of (O) over tilde (root t(mix)SAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t(mix). These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
引用
收藏
页码:115 / 128
页数:14
相关论文
共 50 条
  • [1] Regret bounds for reinforcement learning via markov chain concentration
    Ortner, Ronald
    Journal of Artificial Intelligence Research, 2020, 67 : 115 - 128
  • [2] Minimax Regret Bounds for Reinforcement Learning
    Azar, Mohammad Gheshlaghi
    Osband, Ian
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Variational Regret Bounds for Reinforcement Learning
    Ortner, Ronald
    Gajane, Pratik
    Auer, Peter
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 81 - 90
  • [4] Regret Bounds for Learning State Representations in Reinforcement Learning
    Ortner, Ronald
    Pirotta, Matteo
    Fruit, Ronan
    Lazaric, Alessandro
    Maillard, Odalric-Ambrym
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Variational Bayesian Reinforcement Learning with Regret Bounds
    O'Donoghue, Brendan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [6] Near-optimal Regret Bounds for Reinforcement Learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
  • [7] Regret Bounds for Information-Directed Reinforcement Learning
    Hao, Botao
    Lattimore, Tor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
    Lakshmanan, K.
    Ortner, Ronald
    Ryabko, Daniil
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 524 - 532
  • [9] Regret Bounds for Risk-Sensitive Reinforcement Learning
    Bastani, Osbert
    Ma, Yecheng Jason
    Shen, Estelle
    Xu, Wanqiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Kernelized Reinforcement Learning with Order Optimal Regret Bounds
    Vakili, Sattar
    Olkhovskaya, Julia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,