An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引:0
|
作者
Kalkanli, Cem [1 ]
Ozgur, Ayfer [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;
D O I
10.1109/isit44484.2020.9174371
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.
引用
收藏
页码:2783 / 2788
页数:6
相关论文
共 50 条
  • [1] A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit
    Yang, Long
    Li, Zhao
    Hu, Zehong
    Ruan, Shasha
    Pan, Gang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5332 - 5341
  • [2] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
    Jung, Young Hun
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems
    Abeille, Marc
    Lazaric, Alessandro
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [4] Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
    Srinivas, Niranjan
    Krause, Andreas
    Kakade, Sham M.
    Seeger, Matthias W.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3250 - 3265
  • [5] Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
    Moradipari, Ahmadreza
    Pedramfar, Mohammad
    Zini, Modjtaba Shokrian
    Aggarwal, Vaneet
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Thompson Sampling for the Multinomial Logit Bandit
    Agrawal, Shipra
    Avadhanula, Vashist
    Goyal, Vineet
    Zeevi, Assaf
    MATHEMATICS OF OPERATIONS RESEARCH, 2025,
  • [7] Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1152 - 1161
  • [8] Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
    Li, Ke
    Yang, Yun
    Narisetty, Naveen N.
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (02): : 5652 - 5695
  • [9] Self-accelerated Thompson sampling with near-optimal regret upper bound
    Zhu, Zhenyu
    Huang, Liusheng
    Xu, Hongli
    NEUROCOMPUTING, 2020, 399 : 37 - 47
  • [10] Regret Bounds for Safe Gaussian Process Bandit Optimization
    Amani, Sanae
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 527 - 532