An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引：0

作者：

Kalkanli, Cem ^{[1
]}

Ozgur, Ayfer ^{[1
]}

机构：

[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年

关键词：

Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;

D O I：

10.1109/isit44484.2020.9174371

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.

引用

页码：2783 / 2788

页数：6

共 50 条

[41] Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
Honda, Junya
Takemura, Akimichi
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 375 - 383
[42] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
Ghalme, Ganesh
Jain, Shweta
Gujar, Sujit
Narahari, Y.
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
[43] Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Yuan, Hui
Ni, Chengzhuo
Wang, Huazheng
Zhang, Xuezhou
Cong, Le
Szepesvari, Csaba
Wang, Mengdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[44] Prior-free and prior-dependent regret bounds for Thompson Sampling
Bubeck, Sebastien
Liu, Che-Yu
2014 48TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2014,
[45] Safe Linear Thompson Sampling With Side Information
Moradipari, Ahmadreza
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 3755 - 3767
[46] Doubly Robust Thompson Sampling with Linear Payoffs
Kim, Wonyoung
Kim, Gi-Soo
Paik, Myunghee Cho
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[47] Control of Unknown Linear Systems with Thompson Sampling
Ouyang, Yi
Gagrani, Mukul
Jain, Rahul
2017 55TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2017, : 1198 - 1205
[48] Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks
Manisha, Padala
Gujar, Sujit
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2111 - 2113
[49] Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
Huyuk, Alihan
Tekin, Cem
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[50] Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Komiyama, Junpei
Honda, Junya
Nakagawa, Hiroshi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48

← 1 2 3 4 5 →