Thompson Sampling for Combinatorial Semi-Bandits

被引：0

作者：

Wang, Siwei ^{[1
]}

Chen, Wei ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

引用

页数：9

共 50 条

[11] A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
van der Hoeven, Dirk
Zierahn, Lukas
Lancewicki, Tal
Rosenberg, Aviv
Cesa-Bianchi, Nicolo
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[12] Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Wang, Yingfei
Ouyang, Hua
Wang, Chu
Chen, Jianhui
Asamov, Tsvetan
Chang, Yi
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2746 - 2753
[13] An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits
Takemura, Kei
Ito, Shinji
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1318 - 1323
[14] Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
Cuvelier, Thibaut
Combes, Richard
Gourdin, Eric
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (01)
[15] Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
Neu, Gergely
Bartok, Gabor
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17 : 1 - 21
[16] An Efficient Algorithm for Cooperative Semi-Bandits
Della Vecchia, Riccardo
Cesari, Tommaso R.
ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[17] Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
Wang, Qinshi
Chen, Wei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[18] Closing the Computational-Statistical Gap in Best Arm Identification for Combinatorial Semi-bandits
Tzeng, Ruo-Chun
Wang, Po-An
Proutiere, Alexandre
Lu, Chi-Jen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[19] Recurrent SubmodularWelfare and Matroid Blocking Semi-Bandits
Papadigenopoulos, Orestis
Caramanis, Constantine
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[20] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Takemura, Kei
Ito, Shinji
Hatano, Daisuke
Sumita, Hanna
Fukunaga, Takuro
Kakimura, Naonori
Kawarabayashi, Ken-ichi
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798

← 1 2 3 4 5 →