An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting

被引:0
|
作者
Kalkanli, Cem [1 ]
Ozgur, Ayfer [1 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Thompson sampling; Gaussian linear bandit; Bayesian regret bounds; the Cauchy-Schwarz inequality;
D O I
10.1109/isit44484.2020.9174371
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Thompson sampling has been of significant recent interest due to its wide range of applicability to online learning problems and its good empirical and theoretical performance. In this paper, we analyze the performance of Thompson sampling in the canonical Gaussian linear bandit setting. We prove that the Bayesian regret of Thompson sampling in this setting is bounded by O(root T log(T)) improving on an earlier bound of O(root T log(T)) in the literature for the case of the infinite, and compact action set. Our proof relies on a Cauchy-Schwarz type inequality which can be of interest in its own right.
引用
收藏
页码:2783 / 2788
页数:6
相关论文
共 50 条
  • [41] Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
    Honda, Junya
    Takemura, Akimichi
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 375 - 383
  • [42] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
    Ghalme, Ganesh
    Jain, Shweta
    Gujar, Sujit
    Narahari, Y.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
  • [43] Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
    Yuan, Hui
    Ni, Chengzhuo
    Wang, Huazheng
    Zhang, Xuezhou
    Cong, Le
    Szepesvari, Csaba
    Wang, Mengdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [44] Prior-free and prior-dependent regret bounds for Thompson Sampling
    Bubeck, Sebastien
    Liu, Che-Yu
    2014 48TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2014,
  • [45] Safe Linear Thompson Sampling With Side Information
    Moradipari, Ahmadreza
    Amani, Sanae
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 3755 - 3767
  • [46] Doubly Robust Thompson Sampling with Linear Payoffs
    Kim, Wonyoung
    Kim, Gi-Soo
    Paik, Myunghee Cho
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [47] Control of Unknown Linear Systems with Thompson Sampling
    Ouyang, Yi
    Gagrani, Mukul
    Jain, Rahul
    2017 55TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2017, : 1198 - 1205
  • [48] Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks
    Manisha, Padala
    Gujar, Sujit
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2111 - 2113
  • [49] Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
    Huyuk, Alihan
    Tekin, Cem
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [50] Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48