Thompson Sampling for Combinatorial Semi-Bandits

被引：0

作者：

Wang, Siwei ^{[1
]}

Chen, Wei ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

引用

页数：9

共 50 条

[41] Contextual Combinatorial Cascading Thompson Sampling
Zhu, Zhenyu
Huang, Liusheng
Xu, Hongli
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2019, 2019, 11604 : 520 - 532
[42] Thompson Sampling for (Combinatorial) Pure Exploration
Wang, Siwei
Zhu, Jun
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[43] Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
Xu, Ruitu
Min, Yifei
Wang, Tianhao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
Lin, Baihan
2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
[45] Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits
Hayashi, Naoki
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (01) : 293 - 306
[46] Thompson Sampling for Robust Transfer in Multi-Task Bandits
Wang, Zhi
Zhang, Chicheng
Chaudhuri, Kamalika
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[47] Optimal Thompson Sampling strategies for support-aware CVaR bandits
Baudry, Dorian
Gautron, Romain
Kaufmann, Emilie
Maillard, Odalric-Ambrym
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[48] Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits
Kalkanli, Cem
Ozgur, Ayfer
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (09) : 5956 - 5970
[49] Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits
Kalkanli, Cem
Ozgur, Ayfer
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 539 - 544
[50] Thompson sampling for multi-armed bandits in big data environments
Kim, Min Kyong
Hwang, Beom Seuk
KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)

← 1 2 3 4 5 →