Thompson Sampling for Combinatorial Semi-Bandits

被引：0

作者：

Wang, Siwei ^{[1
]}

Chen, Wei ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

引用

页数：9

共 50 条

[1] Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
Perrault, Pierre
Boursier, Etienne
Perchet, Vianney
Valko, Michal
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
Kong, Fang
Yang, Yueran
Chen, Wei
Li, Shuai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[3] Combinatorial Semi-Bandits with Knapsacks
Sankararaman, Karthik Abinav
Slivkins, Aleksandrs
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[4] (Locally) Differentially Private Combinatorial Semi-Bandits
Chen, Xiaoyu
Zheng, Kai
Zhou, Zixin
Yang, Yunchang
Chen, Wei
Wang, Liwei
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[5] (Locally) Differentially Private Combinatorial Semi-Bandits
Chen, Xiaoyu
Zheng, Kai
Zhou, Zixin
Yang, Yunchang
Chen, Wei
Wang, Liwei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
Ito, Shinji
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
Kveton, Branislav
Wen, Zheng
Ashkan, Azin
Szepesvari, Csaba
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 535 - 543
[8] Matching with semi-bandits
Kasy, Maximilian
Teytelboym, Alexander
ECONOMETRICS JOURNAL, 2023, 26 (01): : 45 - 66
[9] Efficient Learning in Large-Scale Combinatorial Semi-Bandits
Wen, Zheng
Kveton, Branislav
Ashkan, Azin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1113 - 1122
[10] Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time
Cuvelier, Thibaut
Combes, Richard
Gourdin, Eric
ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132

← 1 2 3 4 5 →