An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引：0

作者：

Kalkanli, Cem ^{[1
]}

Ozgur, Ayfer ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年

关键词：

Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;

D O I：

10.1109/isit44484.2020.9174371

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.

引用

页码：2783 / 2788

页数：6

共 50 条

[31] First-Order Bayesian Regret Analysis of Thompson Sampling
Bubeck, Sebastien
Sellke, Mark
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 196 - 233
[32] First-Order Bayesian Regret Analysis of Thompson Sampling
Bubeck, Sebastien
Sellke, Mark
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (03) : 1795 - 1823
[33] Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding
Liu, Xingchi
Derakhshani, Mahsa
Zhu, Ziming
Lambotharan, Sangarapillai
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 210 - 215
[34] Online (Multinomial) Logistic Bandit: Improved Regret and Constant Computation Cost
Zhang, Yu-Jie
Sugiyama, Masashi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[35] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
Li, Junfan
Liao, Shizhong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
[36] Improved Regret Bounds for Projection-free Bandit Convex Optimization
Garber, Dan
Kretzu, Ben
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2196 - 2205
[37] Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Cassel, Asaf
Luo, Haipeng
Rosenberg, Aviv
Sotnikov, Dmitry
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
[38] Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures
Timothy Verstraeten
Eugenio Bargiacchi
Pieter J. K. Libin
Jan Helsen
Diederik M. Roijers
Ann Nowé
Scientific Reports, 10
[39] LINEAR THOMPSON SAMPLING UNDER UNKNOWN LINEAR CONSTRAINTS
Moradipari, Ahmadreza
Alizadeh, Mahnoosh
Thrampoulidis, Christos
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3392 - 3396
[40] Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures
Verstraeten, Timothy
Bargiacchi, Eugenio
Libin, Pieter J. K.
Helsen, Jan
Roijers, Diederik M.
Nowe, Ann
SCIENTIFIC REPORTS, 2020, 10 (01)

← 1 2 3 4 5 →