An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引:0
|
作者
Kalkanli, Cem [1 ]
Ozgur, Ayfer [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2020年
关键词
Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;
D O I
10.1109/isit44484.2020.9174371
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.
引用
收藏
页码:2783 / 2788
页数:6
相关论文
共 50 条
  • [31] First-Order Bayesian Regret Analysis of Thompson Sampling
    Bubeck, Sebastien
    Sellke, Mark
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 196 - 233
  • [32] First-Order Bayesian Regret Analysis of Thompson Sampling
    Bubeck, Sebastien
    Sellke, Mark
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (03) : 1795 - 1823
  • [33] Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding
    Liu, Xingchi
    Derakhshani, Mahsa
    Zhu, Ziming
    Lambotharan, Sangarapillai
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 210 - 215
  • [34] Online (Multinomial) Logistic Bandit: Improved Regret and Constant Computation Cost
    Zhang, Yu-Jie
    Sugiyama, Masashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
    Li, Junfan
    Liao, Shizhong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
  • [36] Improved Regret Bounds for Projection-free Bandit Convex Optimization
    Garber, Dan
    Kretzu, Ben
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2196 - 2205
  • [37] Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
    Cassel, Asaf
    Luo, Haipeng
    Rosenberg, Aviv
    Sotnikov, Dmitry
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
  • [38] Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures
    Timothy Verstraeten
    Eugenio Bargiacchi
    Pieter J. K. Libin
    Jan Helsen
    Diederik M. Roijers
    Ann Nowé
    Scientific Reports, 10
  • [39] LINEAR THOMPSON SAMPLING UNDER UNKNOWN LINEAR CONSTRAINTS
    Moradipari, Ahmadreza
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3392 - 3396
  • [40] Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures
    Verstraeten, Timothy
    Bargiacchi, Eugenio
    Libin, Pieter J. K.
    Helsen, Jan
    Roijers, Diederik M.
    Nowe, Ann
    SCIENTIFIC REPORTS, 2020, 10 (01)