Linear Thompson Sampling Revisited

被引:0
|
作者
Abeille, Marc [1 ]
Lazaric, Alessandro [1 ]
机构
[1] Inria Lille Nord Europe, Team SequeL, Villeneuve Dascq, France
来源
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54 | 2017年 / 54卷
关键词
BANDIT; REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O (d(3/2)root T) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional root d regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
引用
收藏
页码:176 / 184
页数:9
相关论文
共 50 条
  • [31] On Thompson Sampling and Asymptotic Optimality
    Leike, Jan
    Lattimore, Tor
    Orseau, Laurent
    Hutter, Marcus
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4889 - 4893
  • [32] Meta-Thompson Sampling
    Kveton, Branislav
    Konobeev, Mikhail
    Zaheer, Manzil
    Hsu, Chih-Wei
    Mladenov, Martin
    Boutilier, Craig
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] Thompson Sampling with a Mixture Prior
    Hong, Joey
    Kveton, Branislav
    Zaheer, Manzil
    Ghavamzadeh, Mohammad
    Boutilier, Craig
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [34] The Thompson/Hunt flight plan revisited
    Thompson, RC
    Hunt, JG
    ACADEMY OF MANAGEMENT REVIEW, 1997, 22 (02): : 333 - 334
  • [35] CHRISTIAN CLASSICS REVISITED - THOMPSON,JJ
    VANDEVENDER, GW
    CHRISTIANITY & LITERATURE, 1984, 34 (01) : 66 - 67
  • [36] Technical Note-The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling
    Hamidi, Nima
    Bayati, Mohsen
    OPERATIONS RESEARCH, 2023, 71 (04) : 1434 - 1439
  • [37] Thompson Sampling Achieves (O)over-tilde (√T) Regret in Linear Quadratic Control
    Kargin, Taylan
    Lale, Sahin
    Azizzadenesheli, Kamyar
    Anandkumar, Anima
    Hassibi, Babak
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [38] Thompson Sampling for Linearly Constrained Bandits
    Saxena, Vidit
    Gonzalez, Joseph E.
    Jalden, Joakim
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [39] Thompson Sampling for the Multinomial Logit Bandit
    Agrawal, Shipra
    Avadhanula, Vashist
    Goyal, Vineet
    Zeevi, Assaf
    MATHEMATICS OF OPERATIONS RESEARCH, 2025,
  • [40] Thompson Sampling for Complex Online Problems
    Gopalan, Aditya
    Mannor, Shie
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32