Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引：0

作者：

Wu, Huasen ^{[1
]}

Srikant, R. ^{[2
]}

Liu, Xin ^{[1
]}

Jiang, Chong ^{[2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Univ Illinois, Champaign, IL USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015) | 2015年 / 28卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.

引用

页数：9

共 50 条

[21] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
Syrgkanis, Vasilis
Luo, Haipeng
Krishnamurthy, Akshay
Schapire, Robert E.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[22] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
Zhu, Yinglun
Mineiro, Paul
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[23] No-Regret Algorithms for Heavy-Tailed Linear Bandits
Medina, Andres Munoz
Yang, Scott
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[24] Mostly Exploration-Free Algorithms for Contextual Bandits
Bastani, Hamsa
Bayati, Mohsen
Khosravi, Khashayar
MANAGEMENT SCIENCE, 2021, 67 (03) : 1329 - 1349
[25] Generalized Contextual Bandits With Latent Features: Algorithms and Applications
Xu, Xiongxiao
Xie, Hong
Lui, John C. S.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4763 - 4775
[26] Instance-optimal PAC Algorithms for Contextual Bandits
Li, Zhaoqi
Ratliff, Lillian
Nassif, Houssam
Jamieson, Kevin
Jain, Lalit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[27] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Li, Lihong
Lu, Yu
Zhou, Dengyong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[28] Regret of Queueing Bandits
Krishnasamy, Subhashini
Sen, Rajat
Johari, Ramesh
Shakkottai, Sanjay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[29] Constrained regret minimization for multi-criterion multi-armed bandits
Kagrecha, Anmol
Nair, Jayakrishnan
Jagannathan, Krishna
MACHINE LEARNING, 2023, 112 (02) : 431 - 458
[30] Constrained regret minimization for multi-criterion multi-armed bandits
Anmol Kagrecha
Jayakrishnan Nair
Krishna Jagannathan
Machine Learning, 2023, 112 : 431 - 458

← 1 2 3 4 5 →