Thompson Sampling for Combinatorial Semi-Bandits

被引：0

作者：

Wang, Siwei ^{[1
]}

Chen, Wei ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

引用

页数：9

共 50 条

[21] Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
Zimmert, Julian
Luo, Haipeng
Wei, Chen-Yu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[22] Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits
Perrault, Pierre
Perchet, Vianney
Valko, Michal
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[23] Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
Liu, Xutong
Zuo, Jinhang
Wang, Siwei
Joe-Wong, Carlee
Lui, John C. S.
Chen, Wei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[24] A Thompson Sampling Algorithm for Cascading Bandits
Cheung, Wang Chi
Tan, Vincent Y. F.
Zhong, Zixin
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 438 - 447
[25] Thompson Sampling for Linearly Constrained Bandits
Saxena, Vidit
Gonzalez, Joseph E.
Jalden, Joakim
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[26] Double Thompson Sampling for Dueling Bandits
Wu, Huasen
Liu, Xin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[27] On the Performance of Thompson Sampling on Logistic Bandits
Dong, Shi
Ma, Tengyu
Van Roy, Benjamin
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[28] Thompson Sampling Algorithms for Cascading Bandits
Zhong, Zixin
Chueng, Wang Chi
Tan, Vincent Y. F.
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[29] Thompson Sampling on Symmetric α-Stable Bandits
Dubey, Abhimanyu
Pentland, Alex Sandy
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5715 - 5721
[30] Thompson Sampling for Bandits with Clustered Arms
Carlsson, Emil
Dubhashi, Devdatt
Johansson, Fredrik D.
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2212 - 2218

← 1 2 3 4 5 →