Linear Thompson Sampling Revisited

被引：0

作者：

Abeille, Marc ^{[1
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷

关键词：

BANDIT; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

引用

页码：176 / 184

页数：9

共 50 条

[21] Sampling - Thompson,SK
Rindskopf, D
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 1997, 22 (02) : 246 - 246
[22] Universal Thompson Sampling
Faella, Marco
Sauro, Luigi
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1109 - 1114
[23] Collaborative Thompson Sampling
Zhu, Zhenyu
Huang, Liusheng
Xu, Hongli
MOBILE NETWORKS & APPLICATIONS, 2020, 25 (04): : 1351 - 1363
[24] Parallelizing Thompson Sampling
Karbasi, Amin
Mirrokni, Vahab
Shadravan, Mohammad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[25] A modified Thompson sampling-based learning algorithm for unknown linear systems
Gagrani, Mukul
Sudhakara, Sagar
Mahajan, Aditya
Nayyar, Ashutosh
Ouyang, Yi
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 6658 - 6665
[26] Thompson-Sampling Based Reinforcement Learning for Networked Control of Unknown Linear Systems
Sayedana, Borna
Afshari, Mohammad
Caines, Peter E.
Mahajan, Aditya
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 723 - 730
[27] An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
Ding, Qin
Hsieh, Cho-Jui
Sharpnack, James
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[28] Thompson Sampling with Approximate Inference
Phan, My
Abbasi-Yadkori, Yasin
Domke, Justin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[29] On the Prior Sensitivity of Thompson Sampling
Liu, Che-Yu
Li, Lihong
ALGORITHMIC LEARNING THEORY, (ALT 2016), 2016, 9925 : 321 - 336
[30] Partial Likelihood Thompson Sampling
Wu, Han
Wager, Stefan
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2138 - 2147

← 1 2 3 4 5 →