Regret Bounds for Reinforcement Learning via Markov Chain Concentration

被引:0
|
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Dept Math & Informat Technol, Franz Josef Str 18, A-8700 Leoben, Austria
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We give a simple optimistic algorithm for which it is easy to derive regret bounds of (O) over tilde (root t(mix)SAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t(mix). These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
引用
收藏
页码:115 / 128
页数:14
相关论文
共 50 条
  • [31] Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
    Zhao, Heyang
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [32] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
    Ortner, Ronald
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
  • [33] Online regret bounds for Markov decision processes with deterministic transitions
    Ortner, Ronald
    THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
  • [34] A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action
    Watanabe, Takashi
    Sakuragawa, Takashi
    ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, 2020, : 51 - 55
  • [35] Regret Analysis in Deterministic Reinforcement Learning
    Tranos, Damianos
    Proutiere, Alexandre
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2246 - 2251
  • [36] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [37] Regret Bounds for Transfer Learning in Bayesian Optimisation
    Shilton, Alistair
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 307 - 315
  • [38] Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation
    Huang, Jiayi
    Zhong, Han
    Wang, Liwei
    Yang, Lin F.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [39] COMPLEXITY BOUNDS FOR MARKOV CHAIN MONTE CARLO ALGORITHMS VIA DIFFUSION LIMITS
    Roberts, Gareth O.
    Rosenthal, Jeffrey S.
    JOURNAL OF APPLIED PROBABILITY, 2016, 53 (02) : 410 - 420
  • [40] Reinforcement Learning with Logarithmic Regret and Policy Switches
    Velegkas, Grigoris
    Yang, Zhuoran
    Karbasi, Amin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,