Thompson Sampling for Combinatorial Semi-Bandits

被引:0
|
作者
Wang, Siwei [1 ]
Chen, Wei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of O(m log T/Delta(min)) for TS under general CMAB, where m is the number of arms, T is the time horizon, and Delta(min) is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits
    Zhou, Huozhi
    Wang, Lingda
    Varshney, Lav R.
    Lim, Ee-Peng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6933 - 6940
  • [32] Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback
    Verma, Arun
    Hanawal, Manjesh K.
    Rajkumar, Arun
    Sankaran, Raman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [33] Thompson Sampling for Multinomial Logit Contextual Bandits
    Oh, Min-hwan
    Iyengar, Garud
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [34] Thompson Sampling for Stochastic Bandits with Graph Feedback
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    Dubhashi, Devdatt
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2660 - 2666
  • [35] Variational Thompson Sampling for Relational Recurrent Bandits
    Lamprier, Sylvain
    Gisselbrecht, Thibault
    Gallinari, Patrick
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 405 - 421
  • [36] Analysis of Thompson Sampling for Stochastic Sleeping Bandits
    Chatterjee, Aritra
    Ghalme, Ganesh
    Jain, Shweta
    Vaish, Rohit
    Narahari, Y.
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [37] Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
    Honda, Junya
    Takemura, Akimichi
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 375 - 383
  • [38] Analysis of Thompson Sampling for Graphical Bandits Without the Graphs
    Liu, Fang
    Zheng, Zizhan
    Shroff, Ness
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 13 - 22
  • [39] Thompson Sampling for Budgeted Multi-armed Bandits
    Xia, Yingce
    Li, Haifang
    Qin, Tao
    Yu, Nenghai
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966
  • [40] On Thompson Sampling for Smoother-than-Lipschitz Bandits
    Grant, James A.
    Leslie, David S.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2612 - 2621