Regret Bounds for Reinforcement Learning via Markov Chain Concentration

被引：0

作者：

Ortner, Ronald ^{[1
]}

机构：

[1] Univ Leoben, Dept Math & Informat Technol, Franz Josef Str 18, A-8700 Leoben, Austria

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2020年 / 67卷

基金：

奥地利科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We give a simple optimistic algorithm for which it is easy to derive regret bounds of (O) over tilde (root t(mix)SAT) after T steps in uniformly ergodic Markov decision processes with S states, A actions, and mixing time parameter t(mix). These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.

引用

页码：115 / 128

页数：14

共 50 条

[31] Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
Zhao, Heyang
He, Jiafan
Zhou, Dongruo
Zhang, Tong
Gu, Quanquan
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[32] Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Ortner, Ronald
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 123 - 137
[33] Online regret bounds for Markov decision processes with deterministic transitions
Ortner, Ronald
THEORETICAL COMPUTER SCIENCE, 2010, 411 (29-30) : 2684 - 2695
[34] A Sublinear-Regret Reinforcement Learning Algorithm on Constrained Markov Decision Processes with reset action
Watanabe, Takashi
Sakuragawa, Takashi
ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, 2020, : 51 - 55
[35] Regret Analysis in Deterministic Reinforcement Learning
Tranos, Damianos
Proutiere, Alexandre
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2246 - 2251
[36] Optimal Regret Bounds for Collaborative Learning in Bandits
Shidani, Amitis
Vakili, Sattar
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
[37] Regret Bounds for Transfer Learning in Bayesian Optimisation
Shilton, Alistair
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 307 - 315
[38] Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation
Huang, Jiayi
Zhong, Han
Wang, Liwei
Yang, Lin F.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[39] COMPLEXITY BOUNDS FOR MARKOV CHAIN MONTE CARLO ALGORITHMS VIA DIFFUSION LIMITS
Roberts, Gareth O.
Rosenthal, Jeffrey S.
JOURNAL OF APPLIED PROBABILITY, 2016, 53 (02) : 410 - 420
[40] Reinforcement Learning with Logarithmic Regret and Policy Switches
Velegkas, Grigoris
Yang, Zhuoran
Karbasi, Amin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →