Thompson Sampling for Combinatorial Semi-Bandits

被引:0
|
作者
Wang, Siwei [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [11] A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
    van der Hoeven, Dirk
    Zierahn, Lukas
    Lancewicki, Tal
    Rosenberg, Aviv
    Cesa-Bianchi, Nicolo
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [12] Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
    Wang, Yingfei
    Ouyang, Hua
    Wang, Chu
    Chen, Jianhui
    Asamov, Tsvetan
    Chang, Yi
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2746 - 2753
  • [13] An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits
    Takemura, Kei
    Ito, Shinji
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1318 - 1323
  • [14] Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
    Cuvelier, Thibaut
    Combes, Richard
    Gourdin, Eric
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (01)
  • [15] Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
    Neu, Gergely
    Bartok, Gabor
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17 : 1 - 21
  • [16] An Efficient Algorithm for Cooperative Semi-Bandits
    Della Vecchia, Riccardo
    Cesari, Tommaso R.
    ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [17] Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
    Wang, Qinshi
    Chen, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [18] Closing the Computational-Statistical Gap in Best Arm Identification for Combinatorial Semi-bandits
    Tzeng, Ruo-Chun
    Wang, Po-An
    Proutiere, Alexandre
    Lu, Chi-Jen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [19] Recurrent SubmodularWelfare and Matroid Blocking Semi-Bandits
    Papadigenopoulos, Orestis
    Caramanis, Constantine
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [20] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
    Takemura, Kei
    Ito, Shinji
    Hatano, Daisuke
    Sumita, Hanna
    Fukunaga, Takuro
    Kakimura, Naonori
    Kawarabayashi, Ken-ichi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798