Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits

被引:0
|
作者
Hayashi, Naoki [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Toyonaka 5608531, Japan
基金
日本学术振兴会;
关键词
Bayes methods; Optimization; Multi-agent systems; Stochastic processes; Information exchange; Scalability; Power system stability; Distributed Thompson sampling; multiagent system; stochastic bandit problem; OPTIMIZATION;
D O I
10.1109/TAC.2024.3426379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article considers a distributed Thompson sampling algorithm for a cooperative multiplayer multiarmed bandit problem. We consider a multiagent system in which each agent pulls an arm according to consensus-based Bayesian inference with probability matching. To estimate the reward probability of each arm, a group of agents shares the observed rewards with neighboring agents in a communication graph. Following the information exchange, each agent updates the estimation of the posterior distributions based on the observed reward and the received information from the neighboring agents. Then, each agent decides which arm to select at the next iteration based on the estimated posterior distribution. We demonstrate that the expected regret for the multiagent system with the proposed distributed Thompson sampling algorithm is logarithmic with iteration. Numerical examples show that agents can effectively estimate the optimal arm by cooperatively learning the reward distribution of a set of arms.
引用
收藏
页码:293 / 306
页数:14
相关论文
共 50 条
  • [21] Stochastic Consensus-Based Control of μGs With Communication Delays and Noises
    Shahab, Mohammad Ali
    Mozafari, Babak
    Soleymani, Soodabeh
    Dehkordi, Nima Mahdian
    Shourkaei, Hosein Mohammadnezhad
    Guerrero, Josep M.
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2019, 34 (05) : 3573 - 3581
  • [22] A Change-Detection-Based Thompson Sampling Framework for Non-Stationary Bandits
    Ghatak, Gourab
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (10) : 1670 - 1676
  • [23] Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
    Xu, Ruitu
    Min, Yifei
    Wang, Tianhao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
    Perrault, Pierre
    Boursier, Etienne
    Perchet, Vianney
    Valko, Michal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
    Lin, Baihan
    2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [26] Thompson Sampling for Robust Transfer in Multi-Task Bandits
    Wang, Zhi
    Zhang, Chicheng
    Chaudhuri, Kamalika
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] Consensus-Based Distributed Robust Filtering for Multisensor Systems With Stochastic Uncertainties
    Rastgar, Fatemeh
    Rahmani, Mehdi
    IEEE SENSORS JOURNAL, 2018, 18 (18) : 7611 - 7618
  • [28] Consensus-Based Rendezvous
    Caicedo-Nunez, Carlos H.
    Zefran, Milos
    2008 IEEE INTERNATIONAL CONFERENCE ON CONTROL APPLICATIONS, VOLS 1 AND 2, 2008, : 255 - 260
  • [29] Consensus-Based Stochastic Control for Model-Free Cell Balancing
    Bistritz, Ilai
    Bambos, Nicholas
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1139 - 1150
  • [30] Optimal Thompson Sampling strategies for support-aware CVaR bandits
    Baudry, Dorian
    Gautron, Romain
    Kaufmann, Emilie
    Maillard, Odalric-Ambrym
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139