Thompson Sampling for Combinatorial Semi-Bandits

被引:0
|
作者
Wang, Siwei [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
    Perrault, Pierre
    Boursier, Etienne
    Perchet, Vianney
    Valko, Michal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
    Kong, Fang
    Yang, Yueran
    Chen, Wei
    Li, Shuai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [3] Combinatorial Semi-Bandits with Knapsacks
    Sankararaman, Karthik Abinav
    Slivkins, Aleksandrs
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [4] (Locally) Differentially Private Combinatorial Semi-Bandits
    Chen, Xiaoyu
    Zheng, Kai
    Zhou, Zixin
    Yang, Yunchang
    Chen, Wei
    Wang, Liwei
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [5] (Locally) Differentially Private Combinatorial Semi-Bandits
    Chen, Xiaoyu
    Zheng, Kai
    Zhou, Zixin
    Yang, Yunchang
    Chen, Wei
    Wang, Liwei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
    Kveton, Branislav
    Wen, Zheng
    Ashkan, Azin
    Szepesvari, Csaba
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 535 - 543
  • [8] Matching with semi-bandits
    Kasy, Maximilian
    Teytelboym, Alexander
    ECONOMETRICS JOURNAL, 2023, 26 (01): : 45 - 66
  • [9] Efficient Learning in Large-Scale Combinatorial Semi-Bandits
    Wen, Zheng
    Kveton, Branislav
    Ashkan, Azin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1113 - 1122
  • [10] Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time
    Cuvelier, Thibaut
    Combes, Richard
    Gourdin, Eric
    ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132