An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引：0

作者：

Kalkanli, Cem ^{[1
]}

Ozgur, Ayfer ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年

关键词：

Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;

D O I：

10.1109/isit44484.2020.9174371

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.

引用

页码：2783 / 2788

页数：6

共 50 条

[1] A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit
Yang, Long
Li, Zhao
Hu, Zehong
Ruan, Shasha
Pan, Gang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5332 - 5341
[2] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
Jung, Young Hun
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems
Abeille, Marc
Lazaric, Alessandro
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[4] Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
Srinivas, Niranjan
Krause, Andreas
Kakade, Sham M.
Seeger, Matthias W.
IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3250 - 3265
[5] Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
Moradipari, Ahmadreza
Pedramfar, Mohammad
Zini, Modjtaba Shokrian
Aggarwal, Vaneet
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Thompson Sampling for the Multinomial Logit Bandit
Agrawal, Shipra
Avadhanula, Vashist
Goyal, Vineet
Zeevi, Assaf
MATHEMATICS OF OPERATIONS RESEARCH, 2025,
[7] Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Komiyama, Junpei
Honda, Junya
Nakagawa, Hiroshi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1152 - 1161
[8] Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
Li, Ke
Yang, Yun
Narisetty, Naveen N.
ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (02): : 5652 - 5695
[9] Self-accelerated Thompson sampling with near-optimal regret upper bound
Zhu, Zhenyu
Huang, Liusheng
Xu, Hongli
NEUROCOMPUTING, 2020, 399 : 37 - 47
[10] Regret Bounds for Safe Gaussian Process Bandit Optimization
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 527 - 532

← 1 2 3 4 5 →