Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引:0
|
作者
Wu, Huasen [1 ]
Srikant, R. [2 ]
Liu, Xin [1 ]
Jiang, Chong [2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Univ Illinois, Champaign, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
    Allen-Zhu, Zeyuan
    Bubeck, Ebastien
    Li, Yuanzhi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [32] Contextual bandits with surrogate losses: Margin bounds and efficient algorithms
    Foster, Dylan J.
    Krishnamurthy, Akshay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [33] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
    Saha, Aadirupa
    Krishnamurthy, Akshay
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [34] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [35] Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
    Warmuth, Manfred K.
    Kuzmin, Dima
    JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2287 - 2320
  • [36] Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension
    Warmuth, Manfred K.
    Kuzmin, Dima
    Journal of Machine Learning Research, 2008, 9 : 2287 - 2320
  • [37] Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret
    Anandkumar, Animashree
    Michael, Nithin
    Tang, Kevin
    Swami, Ananthram
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2011, 29 (04) : 731 - 745
  • [38] Bandit algorithms: Letting go of logarithmic regret for statistical robustness
    Ashutosh, Kumar
    Nair, Jayakrishnan
    Kagrecha, Anmol
    Jagannathan, Krishna
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 622 - +
  • [39] Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
    Kuroki, Yuko
    Rumi, Alberto
    Tsuchiya, Taira
    Vitale, Fabio
    Cesa-Bianchi, Nicolo
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [40] A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits
    Xie, Hong
    Tang, Qiao
    Zhu, Qingsheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9887 - 9899