Nash Regret Guarantees for Linear Bandits

被引:0
|
作者
Sawarni, Ayush [1 ]
Pal, Soumyabrata [2 ]
Barman, Siddharth [1 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Res, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening-referred to as Nash regret-is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of T rounds and with set of arms X in ambient dimension d. Furthermore, we focus on settings in which the stochastic reward-associated with each arm in X-is a non-negative, nu-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of O(root d nu/T log(T vertical bar X vertical bar)). In addition, addressing linear bandit instances in which the set of arms X is not necessarily finite, we obtain a Nash regret upper bound of O(d(5/4)nu(1/2)/root T log(T)). Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Neural Contextual Bandits without Regret
    Kassraie, Parnian
    Krause, Andreas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278
  • [22] Regret of Age-of-Information Bandits
    Fatale, Santosh
    Bhandari, Kavya
    Narula, Urvidh
    Moharir, Sharayu
    Hanawal, Manjesh K.
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (01) : 87 - 100
  • [23] Regret bounds for sleeping experts and bandits
    Kleinberg, Robert
    Niculescu-Mizil, Alexandru
    Sharma, Yogeshwer
    MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
  • [24] Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
    Ito, Shinji
    Hirahara, Shuichi
    Soma, Tasuku
    Yoshida, Yuichi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
    Kim, Yeoneung
    Yang, Insoon
    Jun, Kwang-Sung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [26] Regret Guarantees for Online Deep Control
    Chen, Xinyi
    Minasyan, Edgar
    Lee, Jason D.
    Hazan, Elad
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [27] On the Guarantees of Minimizing Regret in Receding Horizon
    Martin, Andrea
    Furieri, Luca
    Dorfler, Florian
    Lygeros, John
    Ferrari-Trecate, Giancarlo
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2025, 70 (03) : 1547 - 1562
  • [28] Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
    Zhao, Heyang
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [29] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Lenient Regret for Multi-Armed Bandits
    Merlis, Nadav
    Mannor, Shie
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8950 - 8957