Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引:0
|
作者
Wu, Huasen [1 ]
Srikant, R. [2 ]
Liu, Xin [1 ]
Jiang, Chong [2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Univ Illinois, Champaign, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] On Logarithmic Regret for Bandits with Knapsacks
    Ren, Wenbo
    Liu, Jia
    Shroff, Ness B.
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [2] Logarithmic Regret from Sublinear Hints
    Bhaskara, Aditya
    Cutkosky, Ashok
    Kumar, Ravi
    Purohit, Manish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Neural Contextual Bandits without Regret
    Kassraie, Parnian
    Krause, Andreas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278
  • [4] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Sublinear Optimal Policy Value Estimation in Contextual Bandits
    Kong, Weihao
    Valiant, Gregory
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4377 - 4386
  • [6] Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication
    Richard, Hugo
    Boursier, Etienne
    Perchet, Vianney
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [7] Bandits with Side Observations: Bounded vs. Logarithmic Regret
    Degenne, Remy
    Garcelon, Evrard
    Perchet, Vianney
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 467 - 476
  • [8] Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits
    Saber, Hassan
    Pesquerel, Fabien
    Maillard, Odalric-Ambrym
    Talebi, Mohammad Sadegh
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [9] Constrained Online Learning in Networks with Sublinear Regret and Fit
    Paternain, Santiago
    Lee, Soomin
    Zavlanos, Michael M.
    Ribeiro, Alejandro
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5486 - 5493
  • [10] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
    Krishnamurthy, Sanath Kumar
    Zhan, Ruohan
    Athey, Susan
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,