Linear Thompson Sampling Revisited

被引:0
|
作者
Abeille, Marc [1 ]
Lazaric, Alessandro [1 ]
机构
[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France
来源
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷
关键词
BANDIT; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
引用
收藏
页码:176 / 184
页数:9
相关论文
共 50 条
  • [21] Sampling - Thompson,SK
    Rindskopf, D
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 1997, 22 (02) : 246 - 246
  • [22] Universal Thompson Sampling
    Faella, Marco
    Sauro, Luigi
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1109 - 1114
  • [23] Collaborative Thompson Sampling
    Zhu, Zhenyu
    Huang, Liusheng
    Xu, Hongli
    MOBILE NETWORKS & APPLICATIONS, 2020, 25 (04): : 1351 - 1363
  • [24] Parallelizing Thompson Sampling
    Karbasi, Amin
    Mirrokni, Vahab
    Shadravan, Mohammad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [25] A modified Thompson sampling-based learning algorithm for unknown linear systems
    Gagrani, Mukul
    Sudhakara, Sagar
    Mahajan, Aditya
    Nayyar, Ashutosh
    Ouyang, Yi
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 6658 - 6665
  • [26] Thompson-Sampling Based Reinforcement Learning for Networked Control of Unknown Linear Systems
    Sayedana, Borna
    Afshari, Mohammad
    Caines, Peter E.
    Mahajan, Aditya
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 723 - 730
  • [27] An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
    Ding, Qin
    Hsieh, Cho-Jui
    Sharpnack, James
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [28] Thompson Sampling with Approximate Inference
    Phan, My
    Abbasi-Yadkori, Yasin
    Domke, Justin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [29] On the Prior Sensitivity of Thompson Sampling
    Liu, Che-Yu
    Li, Lihong
    ALGORITHMIC LEARNING THEORY, (ALT 2016), 2016, 9925 : 321 - 336
  • [30] Partial Likelihood Thompson Sampling
    Wu, Han
    Wager, Stefan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2138 - 2147