Nash Regret Guarantees for Linear Bandits

被引:0
|
作者
Sawarni, Ayush [1 ]
Pal, Soumyabrata [2 ]
Barman, Siddharth [1 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Res, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening-referred to as Nash regret-is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of T rounds and with set of arms X in ambient dimension d. Furthermore, we focus on settings in which the stochastic reward-associated with each arm in X-is a non-negative, nu-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of O(root d nu/T log(T vertical bar X vertical bar)). In addition, addressing linear bandit instances in which the set of arms X is not necessarily finite, we obtain a Nash regret upper bound of O(d(5/4)nu(1/2)/root T log(T)). Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] No Regret in Cloud Resources Reservation with Violation Guarantees
    Liakopoulos, Nikolaos
    Paschos, Georgios
    Spyropoulos, Thrasyvoulos
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1747 - 1755
  • [42] Beating the Best Nash without Regret
    Ligett, Katrina
    Piliouras, Georgios
    ACM SIGECOM EXCHANGES, 2011, 10 (01) : 23 - 26
  • [43] Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret
    Wei, Lai
    Srivastava, Vaibhav
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2024, 3 : 128 - 142
  • [44] No Weighted-Regret Learning in Adversarial Bandits with Delays
    Bistritz, Ilai
    Zhou, Zhengyuan
    Chen, Xi
    Bambos, Nicholas
    Blanchet, Jose
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [45] No Weighted-Regret Learning in Adversarial Bandits with Delays
    Bistritz, Ilai
    Zhou, Zhengyuan
    Cheny, Xi
    Bambos, Nicholas
    Blanchet, Jose
    Journal of Machine Learning Research, 2022, 23
  • [46] Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits
    Wu, Huasen
    Srikant, R.
    Liu, Xin
    Jiang, Chong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [47] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [48] Do plant guarantees matter? The role of satisfaction and regret when guarantees are present
    Dennis, JH
    Behe, BK
    Fernandez, RT
    Schutzki, R
    Page, TJ
    Spreng, RA
    HORTSCIENCE, 2005, 40 (01) : 142 - 145
  • [49] Bounded Regret for Finite-Armed Structured Bandits
    Lattimore, Tor
    Munos, Remi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [50] An Experimental Design Approach for Regret Minimization in Logistic Bandits
    Mason, Blake
    Jun, Kwang-Sung
    Jain, Lalit
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7736 - 7743