Linear Thompson Sampling Revisited

被引:0
|
作者
Abeille, Marc [1 ]
Lazaric, Alessandro [1 ]
机构
[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France
来源
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷
关键词
BANDIT; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
引用
收藏
页码:176 / 184
页数:9
相关论文
共 50 条
  • [41] THOMPSON SAMPLING MEETS RANKING AND SELECTION
    Peng, Yijie
    Zhang, Gongbo
    2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 3075 - 3086
  • [42] A note on the advantage of context in Thompson sampling
    Byrd, Michael
    Darrow, Ross
    JOURNAL OF REVENUE AND PRICING MANAGEMENT, 2021, 20 (03) : 316 - 321
  • [43] Contextual Combinatorial Cascading Thompson Sampling
    Zhu, Zhenyu
    Huang, Liusheng
    Xu, Hongli
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2019, 2019, 11604 : 520 - 532
  • [44] Thompson Sampling for Adversarial Bit Prediction
    Lewi, Yuval
    Kaplan, Haim
    Mansour, Yishay
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 518 - 553
  • [45] A Thompson Sampling Algorithm for Cascading Bandits
    Cheung, Wang Chi
    Tan, Vincent Y. F.
    Zhong, Zixin
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 438 - 447
  • [46] IntelligentPooling: practical Thompson sampling for mHealth
    Sabina Tomkins
    Peng Liao
    Predrag Klasnja
    Susan Murphy
    Machine Learning, 2021, 110 : 2685 - 2727
  • [47] A note on the advantage of context in Thompson sampling
    Michael Byrd
    Ross Darrow
    Journal of Revenue and Pricing Management, 2021, 20 : 316 - 321
  • [48] Freshness-Aware Thompson Sampling
    Bouneffouf, Djallel
    NEURAL INFORMATION PROCESSING, ICONIP 2014, PT III, 2014, 8836 : 373 - 380
  • [49] Thompson Sampling Itself is Differentially Private
    Ou, Tingting
    Medina, Marco Avella
    Cummings, Rachel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [50] MOTS: Minimax Optimal Thompson Sampling
    Jin, Tianyuan
    Xu, Pan
    Shi, Jieming
    Xiao, Xiaokui
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139