Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits

被引:0
|
作者
Hayashi, Naoki [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Toyonaka 5608531, Japan
基金
日本学术振兴会;
关键词
Bayes methods; Optimization; Multi-agent systems; Stochastic processes; Information exchange; Scalability; Power system stability; Distributed Thompson sampling; multiagent system; stochastic bandit problem; OPTIMIZATION;
D O I
10.1109/TAC.2024.3426379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.
引用
收藏
页码:293 / 306
页数:14
相关论文
共 50 条
  • [31] Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (09) : 5956 - 5970
  • [32] Thompson sampling for multi-armed bandits in big data environments
    Kim, Min Kyong
    Hwang, Beom Seuk
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)
  • [33] Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 539 - 544
  • [34] Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
    Chakraborty, Sunrit
    Roy, Saptarshi
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [35] Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
    Trinh, Cindy
    Kaufmann, Emilie
    Vernade, Claire
    Combes, Richard
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 862 - 889
  • [36] A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
    Chang, Joel Q. L.
    Tan, Vincent Y. F.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6159 - 6166
  • [37] DOUBLE-LINEAR THOMPSON SAMPLING FOR CONTEXT-ATTENTIVE BANDITS
    Bouneffouf, Djallel
    Feraud, Raphael
    Upadhyay, Sohini
    Khazaeni, Yasaman
    Rish, Irina
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3450 - 3454
  • [38] Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits
    Kim, Wonyoung
    Lee, Kyungbok
    Paik, Myunghee Cho
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8300 - 8307
  • [39] PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits
    Dumitrascu, Bianca
    Feng, Karen
    Engelhardt, Barbara E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [40] Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
    Zhang, Tong
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (02): : 834 - 857