Thompson Sampling for Combinatorial Semi-Bandits

被引:0
|
作者
Wang, Siwei [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
    Zimmert, Julian
    Luo, Haipeng
    Wei, Chen-Yu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits
    Perrault, Pierre
    Perchet, Vianney
    Valko, Michal
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [23] Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
    Liu, Xutong
    Zuo, Jinhang
    Wang, Siwei
    Joe-Wong, Carlee
    Lui, John C. S.
    Chen, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [24] A Thompson Sampling Algorithm for Cascading Bandits
    Cheung, Wang Chi
    Tan, Vincent Y. F.
    Zhong, Zixin
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 438 - 447
  • [25] Thompson Sampling for Linearly Constrained Bandits
    Saxena, Vidit
    Gonzalez, Joseph E.
    Jalden, Joakim
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [26] Double Thompson Sampling for Dueling Bandits
    Wu, Huasen
    Liu, Xin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [27] On the Performance of Thompson Sampling on Logistic Bandits
    Dong, Shi
    Ma, Tengyu
    Van Roy, Benjamin
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [28] Thompson Sampling Algorithms for Cascading Bandits
    Zhong, Zixin
    Chueng, Wang Chi
    Tan, Vincent Y. F.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [29] Thompson Sampling on Symmetric α-Stable Bandits
    Dubey, Abhimanyu
    Pentland, Alex Sandy
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5715 - 5721
  • [30] Thompson Sampling for Bandits with Clustered Arms
    Carlsson, Emil
    Dubhashi, Devdatt
    Johansson, Fredrik D.
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2212 - 2218