Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits

被引:0
|
作者
Hayashi, Naoki [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Toyonaka 5608531, Japan
基金
日本学术振兴会;
关键词
Bayes methods; Optimization; Multi-agent systems; Stochastic processes; Information exchange; Scalability; Power system stability; Distributed Thompson sampling; multiagent system; stochastic bandit problem; OPTIMIZATION;
D O I
10.1109/TAC.2024.3426379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.
引用
收藏
页码:293 / 306
页数:14
相关论文
共 50 条
  • [1] Thompson Sampling for Stochastic Bandits with Graph Feedback
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    Dubhashi, Devdatt
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2660 - 2666
  • [2] Analysis of Thompson Sampling for Stochastic Sleeping Bandits
    Chatterjee, Aritra
    Ghalme, Ganesh
    Jain, Shweta
    Vaish, Rohit
    Narahari, Y.
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [3] Consensus-based sampling
    Carrillo, J. A.
    Hoffmann, F.
    Stuart, A. M.
    Vaes, U.
    STUDIES IN APPLIED MATHEMATICS, 2022, 148 (03) : 1069 - 1140
  • [4] Near-Optimal Thompson Sampling-based Algorithms for Differentially Private Stochastic Bandits
    Hu, Bingshan
    Hegde, Nidhi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 844 - +
  • [5] A Thompson Sampling Algorithm for Cascading Bandits
    Cheung, Wang Chi
    Tan, Vincent Y. F.
    Zhong, Zixin
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 438 - 447
  • [6] Thompson Sampling for Linearly Constrained Bandits
    Saxena, Vidit
    Gonzalez, Joseph E.
    Jalden, Joakim
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [7] Double Thompson Sampling for Dueling Bandits
    Wu, Huasen
    Liu, Xin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [8] On the Performance of Thompson Sampling on Logistic Bandits
    Dong, Shi
    Ma, Tengyu
    Van Roy, Benjamin
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [9] Thompson Sampling on Symmetric α-Stable Bandits
    Dubey, Abhimanyu
    Pentland, Alex Sandy
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5715 - 5721
  • [10] Thompson Sampling for Bandits with Clustered Arms
    Carlsson, Emil
    Dubhashi, Devdatt
    Johansson, Fredrik D.
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2212 - 2218