An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引：0

作者：

Kalkanli, Cem ^{[1
]}

Ozgur, Ayfer ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年

关键词：

Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;

D O I：

10.1109/isit44484.2020.9174371

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.

引用

页码：2783 / 2788

页数：6

共 50 条

[21] Society of Agents: Regret Bounds of Concurrent Thompson Sampling
Chen, Yan
Dong, Perry
Bai, Qinxun
Dimakopoulou, Maria
Xu, Wei
Zhou, Zhengyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[22] Feedback graph regret bounds for Thompson Sampling and UCB
Lykouris, Thodoris
Tardos, Eva
Wali, Drishti
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 592 - 614
[23] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
Hung Tran-The
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[24] Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions
Riou, Charles
Honda, Junya
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 777 - 826
[25] The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
Lee, Jongyeong
Chiang, Chao-Kai
Sugiyama, Masashi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13383 - 13390
[26] A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning
Xu, Hanxuan
Xie, Hong
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 255 - 266
[27] An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
Watanabe, Ryo
Nakamura, Atsuyoshi
Kudo, Mineichi
OPERATIONS RESEARCH LETTERS, 2015, 43 (06) : 558 - 563
[28] Linear Thompson Sampling Revisited
Abeille, Marc
Lazaric, Alessandro
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 176 - 184
[29] Improved algorithms for bandit with graph feedback via regret decomposition
He, Yuchen
Zhang, Chihao
THEORETICAL COMPUTER SCIENCE, 2023, 979
[30] Linear Thompson sampling revisited
Abeille, Marc
Lazaric, Alessandro
ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 5165 - 5197

← 1 2 3 4 5 →