Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引:0
|
作者
Wu, Huasen [1 ]
Srikant, R. [2 ]
Liu, Xin [1 ]
Jiang, Chong [2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Univ Illinois, Champaign, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
    Syrgkanis, Vasilis
    Luo, Haipeng
    Krishnamurthy, Akshay
    Schapire, Robert E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [22] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
    Zhu, Yinglun
    Mineiro, Paul
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [23] No-Regret Algorithms for Heavy-Tailed Linear Bandits
    Medina, Andres Munoz
    Yang, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [24] Mostly Exploration-Free Algorithms for Contextual Bandits
    Bastani, Hamsa
    Bayati, Mohsen
    Khosravi, Khashayar
    MANAGEMENT SCIENCE, 2021, 67 (03) : 1329 - 1349
  • [25] Generalized Contextual Bandits With Latent Features: Algorithms and Applications
    Xu, Xiongxiao
    Xie, Hong
    Lui, John C. S.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4763 - 4775
  • [26] Instance-optimal PAC Algorithms for Contextual Bandits
    Li, Zhaoqi
    Ratliff, Lillian
    Nassif, Houssam
    Jamieson, Kevin
    Jain, Lalit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [27] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
    Li, Lihong
    Lu, Yu
    Zhou, Dengyong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [28] Regret of Queueing Bandits
    Krishnasamy, Subhashini
    Sen, Rajat
    Johari, Ramesh
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [29] Constrained regret minimization for multi-criterion multi-armed bandits
    Kagrecha, Anmol
    Nair, Jayakrishnan
    Jagannathan, Krishna
    MACHINE LEARNING, 2023, 112 (02) : 431 - 458
  • [30] Constrained regret minimization for multi-criterion multi-armed bandits
    Anmol Kagrecha
    Jayakrishnan Nair
    Krishna Jagannathan
    Machine Learning, 2023, 112 : 431 - 458