Nash Regret Guarantees for Linear Bandits

被引:0
|
作者
Sawarni, Ayush [1 ]
Pal, Soumyabrata [2 ]
Barman, Siddharth [1 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Res, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening-referred to as Nash regret-is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of T rounds and with set of arms X in ambient dimension d. Furthermore, we focus on settings in which the stochastic reward-associated with each arm in X-is a non-negative, nu-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of O(root d nu/T log(T vertical bar X vertical bar)). In addition, addressing linear bandit instances in which the set of arms X is not necessarily finite, we obtain a Nash regret upper bound of O(d(5/4)nu(1/2)/root T log(T)). Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
    Tirinzoni, Andrea
    Papini, Matteo
    Touati, Ahmed
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Experimental Design for Regret Minimization in Linear Bandits
    Wagenmaker, Andrew
    Katz-Samuels, Julian
    Jamieson, Kevin
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [3] No-Regret Linear Bandits beyond Realizability
    Liu, Chong
    Yin, Ming
    Wang, Yu-Xiang
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1294 - 1303
  • [4] Online Linear Quadratic Tracking With Regret Guarantees
    Karapetyan, Aren
    Bolliger, Diego
    Tsiamis, Anastasios
    Balta, Efe C.
    Lygeros, John
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 3950 - 3955
  • [5] Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits
    Luo, Haipeng
    Zhang, Mengxiao
    Zhao, Peng
    Zhou, Zhi-Hua
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [6] Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
    Ito, Shinji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] No-Regret Algorithms for Heavy-Tailed Linear Bandits
    Medina, Andres Munoz
    Yang, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [8] Regret of Queueing Bandits
    Krishnasamy, Subhashini
    Sen, Rajat
    Johari, Ramesh
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [9] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
    Li, Yingkai
    Wang, Yining
    Chen, Xi
    Zhou, Yuan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
  • [10] No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable Guarantees
    Perera, R. Malinga
    Oetomo, Bastian
    Rubinstein, Benjamin I. P.
    Borovica-Gajic, Renata
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12855 - 12872