Linear Thompson Sampling Revisited

被引：0

作者：

Abeille, Marc ^{[1
]}

Lazaric, Alessandro ^{[1
]}

机构：

[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France

来源：

ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷

关键词：

BANDIT; REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.

引用

页码：176 / 184

页数：9

共 50 条

[41] THOMPSON SAMPLING MEETS RANKING AND SELECTION
Peng, Yijie
Zhang, Gongbo
2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 3075 - 3086
[42] A note on the advantage of context in Thompson sampling
Byrd, Michael
Darrow, Ross
JOURNAL OF REVENUE AND PRICING MANAGEMENT, 2021, 20 (03) : 316 - 321
[43] Contextual Combinatorial Cascading Thompson Sampling
Zhu, Zhenyu
Huang, Liusheng
Xu, Hongli
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2019, 2019, 11604 : 520 - 532
[44] Thompson Sampling for Adversarial Bit Prediction
Lewi, Yuval
Kaplan, Haim
Mansour, Yishay
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 518 - 553
[45] A Thompson Sampling Algorithm for Cascading Bandits
Cheung, Wang Chi
Tan, Vincent Y. F.
Zhong, Zixin
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 438 - 447
[46] IntelligentPooling: practical Thompson sampling for mHealth
Sabina Tomkins
Peng Liao
Predrag Klasnja
Susan Murphy
Machine Learning, 2021, 110 : 2685 - 2727
[47] A note on the advantage of context in Thompson sampling
Michael Byrd
Ross Darrow
Journal of Revenue and Pricing Management, 2021, 20 : 316 - 321
[48] Freshness-Aware Thompson Sampling
Bouneffouf, Djallel
NEURAL INFORMATION PROCESSING, ICONIP 2014, PT III, 2014, 8836 : 373 - 380
[49] Thompson Sampling Itself is Differentially Private
Ou, Tingting
Medina, Marco Avella
Cummings, Rachel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[50] MOTS: Minimax Optimal Thompson Sampling
Jin, Tianyuan
Xu, Pan
Shi, Jieming
Xiao, Xiaokui
Gu, Quanquan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →