Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication

被引：0

作者：

Richard, Hugo ^{[1
]}

Boursier, Etienne ^{[2
]}

Perchet, Vianney

机构：

[1] Criteo AI Lab, FAIRPLAY Joint Team, Paris, France

[2] Univ Paris Saclay LMO, INRIA, Orsay, France

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

MULTIARMED BANDIT; REWARDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of O(root T log(T)) in the centralized setting. More significantly, we introduce Cautious Greedy, which uses O(log(T)) communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, log(T) times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with O(log(T)) communications.

引用

页数：43

共 22 条

[1] On Logarithmic Regret for Bandits with Knapsacks
Ren, Wenbo
Liu, Jia
Shroff, Ness B.
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[2] On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
Nayyar, Naumaan
Kalathil, Dileep
Jain, Rahul
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (01): : 597 - 606
[3] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
Wu, Huasen
Srikant, R.
Liu, Xin
Jiang, Chong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[4] Bandits with Side Observations: Bounded vs. Logarithmic Regret
Degenne, Remy
Garcelon, Evrard
Perchet, Vianney
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 467 - 476
[5] Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits
Saber, Hassan
Pesquerel, Fabien
Maillard, Odalric-Ambrym
Talebi, Mohammad Sadegh
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[6] Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk
Chen, Tianrui
Gangrade, Aditya
Saligrama, Venkatesh
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[7] Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
Sharma, Nihal
Sen, Rajat
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2024, 9 (03)
[8] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Tirinzoni, Andrea
Papini, Matteo
Touati, Ahmed
Lazaric, Alessandro
Pirotta, Matteo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Constant regret for sequence prediction with limited advice
Saad, El Mehdi
Blanchard, Gilles
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 1343 - 1386
[10] SIC - MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
Boursier, Etienne
Perchet, Vianney
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 →