Nash Regret Guarantees for Linear Bandits

被引:0
|
作者
Sawarni, Ayush [1 ]
Pal, Soumyabrata [2 ]
Barman, Siddharth [1 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Res, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening-referred to as Nash regret-is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of T rounds and with set of arms X in ambient dimension d. Furthermore, we focus on settings in which the stochastic reward-associated with each arm in X-is a non-negative, nu-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of O(root d nu/T log(T vertical bar X vertical bar)). In addition, addressing linear bandit instances in which the set of arms X is not necessarily finite, we obtain a Nash regret upper bound of O(d(5/4)nu(1/2)/root T log(T)). Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Complete Policy Regret Bounds for Tallying Bandits
    Malik, Dhruv
    Li, Yuanzhi
    Singh, Aarti
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [32] Routine Bandits: Minimizing Regret on Recurring Problems
    Saber, Hassan
    Saci, Leo
    Maillard, Odalric-Ambrym
    Durand, Audrey
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 3 - 18
  • [33] Pure Exploration and Regret Minimization in Matching Bandits
    Sentenac, Flore
    Yi, Jialin
    Calauzenes, Clement
    Perchet, Vianney
    Vojnovic, Milan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [34] Simple regret for infinitely many armed bandits
    Carpentier, Alexandra
    Valko, Michal
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1133 - 1141
  • [35] Optimal Regret Bounds for Collaborative Learning in Bandits
    Shidani, Amitis
    Vakili, Sattar
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [36] An α-No-Regret Algorithm For Graphical Bilinear Bandits
    Rizk, Geovani
    Colin, Igor
    Thomas, Albert
    Laraki, Rida
    Chevaleyre, Yann
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Robustness Guarantees for Mode Estimation with an Application to Bandits
    Pacchiano, Aldo
    Jiang, Heinrich
    Jordan, Michael, I
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9277 - 9284
  • [38] Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
    Cisneros-Velarde, Pedro
    Koyejo, Sanmi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 424 - 432
  • [39] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
    Takemura, Kei
    Ito, Shinji
    Hatano, Daisuke
    Sumita, Hanna
    Fukunaga, Takuro
    Kakimura, Naonori
    Kawarabayashi, Ken-ichi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
  • [40] Online switching control with stability and regret guarantees
    Li, Yingying
    Preiss, James A.
    Li, Na
    Lin, Yiheng
    Wierman, Adam
    Shamma, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211