Thompson Sampling for Combinatorial Semi-Bandits

被引:0
|
作者
Wang, Siwei [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Contextual Combinatorial Cascading Thompson Sampling
    Zhu, Zhenyu
    Huang, Liusheng
    Xu, Hongli
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2019, 2019, 11604 : 520 - 532
  • [42] Thompson Sampling for (Combinatorial) Pure Exploration
    Wang, Siwei
    Zhu, Jun
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
    Xu, Ruitu
    Min, Yifei
    Wang, Tianhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
    Lin, Baihan
    2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [45] Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits
    Hayashi, Naoki
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (01) : 293 - 306
  • [46] Thompson Sampling for Robust Transfer in Multi-Task Bandits
    Wang, Zhi
    Zhang, Chicheng
    Chaudhuri, Kamalika
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [47] Optimal Thompson Sampling strategies for support-aware CVaR bandits
    Baudry, Dorian
    Gautron, Romain
    Kaufmann, Emilie
    Maillard, Odalric-Ambrym
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [48] Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (09) : 5956 - 5970
  • [49] Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 539 - 544
  • [50] Thompson sampling for multi-armed bandits in big data environments
    Kim, Min Kyong
    Hwang, Beom Seuk
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)