An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引:0
|
作者
Kalkanli, Cem [1 ]
Ozgur, Ayfer [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年
关键词
Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;
D O I
10.1109/isit44484.2020.9174371
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.
引用
收藏
页码:2783 / 2788
页数:6
相关论文
共 50 条
  • [21] Society of Agents: Regret Bounds of Concurrent Thompson Sampling
    Chen, Yan
    Dong, Perry
    Bai, Qinxun
    Dimakopoulou, Maria
    Xu, Wei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [22] Feedback graph regret bounds for Thompson Sampling and UCB
    Lykouris, Thodoris
    Tardos, Eva
    Wali, Drishti
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 592 - 614
  • [23] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
    Hung Tran-The
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [24] Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions
    Riou, Charles
    Honda, Junya
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 777 - 826
  • [25] The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
    Lee, Jongyeong
    Chiang, Chao-Kai
    Sugiyama, Masashi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13383 - 13390
  • [26] A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning
    Xu, Hanxuan
    Xie, Hong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 255 - 266
  • [27] An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
    Watanabe, Ryo
    Nakamura, Atsuyoshi
    Kudo, Mineichi
    OPERATIONS RESEARCH LETTERS, 2015, 43 (06) : 558 - 563
  • [28] Linear Thompson Sampling Revisited
    Abeille, Marc
    Lazaric, Alessandro
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 176 - 184
  • [29] Improved algorithms for bandit with graph feedback via regret decomposition
    He, Yuchen
    Zhang, Chihao
    THEORETICAL COMPUTER SCIENCE, 2023, 979
  • [30] Linear Thompson sampling revisited
    Abeille, Marc
    Lazaric, Alessandro
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 5165 - 5197