Linear Thompson Sampling Revisited

被引：0

作者：

Abeille, Marc ^{[1
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷

关键词：

BANDIT; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

引用

页码：176 / 184

页数：9

共 50 条

[31] On Thompson Sampling and Asymptotic Optimality
Leike, Jan
Lattimore, Tor
Orseau, Laurent
Hutter, Marcus
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4889 - 4893
[32] Meta-Thompson Sampling
Kveton, Branislav
Konobeev, Mikhail
Zaheer, Manzil
Hsu, Chih-Wei
Mladenov, Martin
Boutilier, Craig
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[33] Thompson Sampling with a Mixture Prior
Hong, Joey
Kveton, Branislav
Zaheer, Manzil
Ghavamzadeh, Mohammad
Boutilier, Craig
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[34] The Thompson/Hunt flight plan revisited
Thompson, RC
Hunt, JG
ACADEMY OF MANAGEMENT REVIEW, 1997, 22 (02): : 333 - 334
[35] CHRISTIAN CLASSICS REVISITED - THOMPSON,JJ
VANDEVENDER, GW
CHRISTIANITY & LITERATURE, 1984, 34 (01) : 66 - 67
[36] Technical Note-The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling
Hamidi, Nima
Bayati, Mohsen
OPERATIONS RESEARCH, 2023, 71 (04) : 1434 - 1439
[37] Thompson Sampling Achieves (O)over-tilde (√T) Regret in Linear Quadratic Control
Kargin, Taylan
Lale, Sahin
Azizzadenesheli, Kamyar
Anandkumar, Anima
Hassibi, Babak
CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[38] Thompson Sampling for Linearly Constrained Bandits
Saxena, Vidit
Gonzalez, Joseph E.
Jalden, Joakim
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[39] Thompson Sampling for the Multinomial Logit Bandit
Agrawal, Shipra
Avadhanula, Vashist
Goyal, Vineet
Zeevi, Assaf
MATHEMATICS OF OPERATIONS RESEARCH, 2025,
[40] Thompson Sampling for Complex Online Problems
Gopalan, Aditya
Mannor, Shie
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32

← 1 2 3 4 5 →