Regret Bounds for Reinforcement Learning via Markov Chain Concentration

被引:0
|
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Dept Math & Informat Technol, Franz Josef Str 18, A-8700 Leoben, Austria
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We give a simple optimistic algorithm for which it is easy to derive regret bounds of (O) over tilde (root t(mix)SAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t(mix). These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
引用
收藏
页码:115 / 128
页数:14
相关论文
共 50 条
  • [21] Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
    Liang, Hao
    Luo, Zhi-Quan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [22] Regret Bounds for Lifelong Learning
    Alquier, Pierre
    The Tien Mai
    Pontil, Massimiliano
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 261 - 269
  • [23] Tighter Robust Upper Bounds for Options via No-Regret Learning
    Xue, Shan
    Du, Ye
    Xu, Liang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5348 - 5356
  • [24] Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures
    Liang, Hao
    Luo, Zhi-Quan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [25] Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
    Fei, Yingjie
    Yang, Zhuoran
    Chen, Yudong
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] No-regret Reinforcement Learning
    Gopalan, Aditya
    2019 FIFTH INDIAN CONTROL CONFERENCE (ICC), 2019, : 16 - 16
  • [27] SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
    Trummer, Immanuel
    Wang, Junxiong
    Maram, Deepak
    Moseley, Samuel
    Jo, Saehan
    Antonakakis, Joseph
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1153 - 1170
  • [28] SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning
    Trummer, Immanuel
    Wang, Junxiong
    Wei, Ziyun
    Maram, Deepak
    Moseley, Samuel
    Jo, Saehan
    Antonakakis, Joseph
    Rayabhari, Ankush
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2021, 46 (03):
  • [29] SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
    Trummer, Immanuel
    Moseley, Samuel
    Maram, Deepak
    Jo, Saehan
    Antonakakis, Joseph
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2074 - 2077
  • [30] Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
    Zanette, Andrea
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97