Linear Thompson Sampling Revisited

被引：0

作者：

Abeille, Marc ^{[1
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷

关键词：

BANDIT; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

引用

页码：176 / 184

页数：9

共 50 条

[1] Linear Thompson sampling revisited
Abeille, Marc
Lazaric, Alessandro
ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 5165 - 5197
[2] LINEAR THOMPSON SAMPLING UNDER UNKNOWN LINEAR CONSTRAINTS
Moradipari, Ahmadreza
Alizadeh, Mahnoosh
Thrampoulidis, Christos
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3392 - 3396
[3] THE LINEAR SAMPLING METHOD REVISITED
Arens, Tilo
Lechleiter, Armin
JOURNAL OF INTEGRAL EQUATIONS AND APPLICATIONS, 2009, 21 (02) : 179 - 202
[4] Safe Linear Thompson Sampling With Side Information
Moradipari, Ahmadreza
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 3755 - 3767
[5] Doubly Robust Thompson Sampling with Linear Payoffs
Kim, Wonyoung
Kim, Gi-Soo
Paik, Myunghee Cho
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Control of Unknown Linear Systems with Thompson Sampling
Ouyang, Yi
Gagrani, Mukul
Jain, Rahul
2017 55TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2017, : 1198 - 1205
[7] Thompson Sampling for Linear-Quadratic Control Problems
Abeille, Marc
Lazaric, Alessandro
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 1246 - 1254
[8] Gain estimation of linear dynamical systems using Thompson Sampling
Mueller, Matias, I
Rojas, Cristian R.
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[9] Thompson Sampling for Partially Observable Linear-Quadratic Control
Kargin, Taylan
Lale, Sahin
Azizzadenesheli, Kamyar
Anandkumar, Anima
Hassibi, Babak
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4561 - 4568
[10] Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
Xu, Ruitu
Min, Yifei
Wang, Tianhao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →