Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication

被引:0
|
作者
Richard, Hugo [1 ]
Boursier, Etienne [2 ]
Perchet, Vianney
机构
[1] Criteo AI Lab, FAIRPLAY Joint Team, Paris, France
[2] Univ Paris Saclay LMO, INRIA, Orsay, France
关键词
MULTIARMED BANDIT; REWARDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of O(root T log(T)) in the centralized setting. More significantly, we introduce Cautious Greedy, which uses O(log(T)) communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, log(T) times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with O(log(T)) communications.
引用
收藏
页数:43
相关论文
共 22 条
  • [1] On Logarithmic Regret for Bandits with Knapsacks
    Ren, Wenbo
    Liu, Jia
    Shroff, Ness B.
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [2] On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
    Nayyar, Naumaan
    Kalathil, Dileep
    Jain, Rahul
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (01): : 597 - 606
  • [3] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [4] Bandits with Side Observations: Bounded vs. Logarithmic Regret
    Degenne, Remy
    Garcelon, Evrard
    Perchet, Vianney
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 467 - 476
  • [5] Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits
    Saber, Hassan
    Pesquerel, Fabien
    Maillard, Odalric-Ambrym
    Talebi, Mohammad Sadegh
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [6] Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk
    Chen, Tianrui
    Gangrade, Aditya
    Saligrama, Venkatesh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
    Sharma, Nihal
    Sen, Rajat
    Basu, Soumya
    Shanmugam, Karthikeyan
    Shakkottai, Sanjay
    ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
  • [8] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
    Tirinzoni, Andrea
    Papini, Matteo
    Touati, Ahmed
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Constant regret for sequence prediction with limited advice
    Saad, El Mehdi
    Blanchard, Gilles
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 1343 - 1386
  • [10] SIC - MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
    Boursier, Etienne
    Perchet, Vianney
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32