Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引：0

作者：

Wu, Huasen ^{[1
]}

Srikant, R. ^{[2
]}

Liu, Xin ^{[1
]}

Jiang, Chong ^{[2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Univ Illinois, Champaign, IL USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015) | 2015年 / 28卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.

引用

页数：9

共 50 条

[31] Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
Allen-Zhu, Zeyuan
Bubeck, Ebastien
Li, Yuanzhi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[32] Contextual bandits with surrogate losses: Margin bounds and efficient algorithms
Foster, Dylan J.
Krishnamurthy, Akshay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[33] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
Saha, Aadirupa
Krishnamurthy, Akshay
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
[34] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
He, Jiafan
Zhou, Dongruo
Zhang, Tong
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[35] Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
Warmuth, Manfred K.
Kuzmin, Dima
JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2287 - 2320
[36] Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension
Warmuth, Manfred K.
Kuzmin, Dima
Journal of Machine Learning Research, 2008, 9 : 2287 - 2320
[37] Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret
Anandkumar, Animashree
Michael, Nithin
Tang, Kevin
Swami, Ananthram
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2011, 29 (04) : 731 - 745
[38] Bandit algorithms: Letting go of logarithmic regret for statistical robustness
Ashutosh, Kumar
Nair, Jayakrishnan
Kagrecha, Anmol
Jagannathan, Krishna
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 622 - +
[39] Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
Kuroki, Yuko
Rumi, Alberto
Tsuchiya, Taira
Vitale, Fabio
Cesa-Bianchi, Nicolo
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[40] A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits
Xie, Hong
Tang, Qiao
Zhu, Qingsheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9887 - 9899

← 1 2 3 4 5 →