Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits

被引：0

作者：

Wu, Huasen ^{[1
]}

Srikant, R. ^{[2
]}

Liu, Xin ^{[1
]}

Jiang, Chong ^{[2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Univ Illinois, Champaign, IL USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015) | 2015年 / 28卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study contextual bandits with budget and time constraints, referred to as constrained contextual bandits. The time and budget constraints significantly complicate the exploration and exploitation tradeoff because they introduce complex coupling among contexts over time. To gain insight, we first study unit-cost systems with known context distribution. When the expected rewards are known, we develop an approximation of the oracle, referred to Adaptive-Linear-Programming ( ALP), which achieves near-optimality and only requires the ordering of expected rewards. With these highly desirable features, we then combine ALP with the upper-confidence-bound (UCB) method in the general case where the expected rewards are unknown a priori. We show that the proposed UCB-ALP algorithm achieves logarithmic regret except for certain boundary cases. Further, we design algorithms and obtain similar regret bounds for more general systems with unknown context distribution and heterogeneous costs. To the best of our knowledge, this is the first work that shows how to achieve logarithmic regret in constrained contextual bandits. Moreover, this work also sheds light on the study of computationally efficient algorithms for general constrained contextual bandits.

引用

页数：9

共 50 条

[1] On Logarithmic Regret for Bandits with Knapsacks
Ren, Wenbo
Liu, Jia
Shroff, Ness B.
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[2] Logarithmic Regret from Sublinear Hints
Bhaskara, Aditya
Cutkosky, Ashok
Kumar, Ravi
Purohit, Manish
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Neural Contextual Bandits without Regret
Kassraie, Parnian
Krause, Andreas
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 240 - 278
[4] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
Ghosh, Avishek
Sankararaman, Abishek
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] Sublinear Optimal Policy Value Estimation in Contextual Bandits
Kong, Weihao
Valiant, Gregory
Brunskill, Emma
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4377 - 4386
[6] Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication
Richard, Hugo
Boursier, Etienne
Perchet, Vianney
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[7] Bandits with Side Observations: Bounded vs. Logarithmic Regret
Degenne, Remy
Garcelon, Evrard
Perchet, Vianney
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 467 - 476
[8] Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits
Saber, Hassan
Pesquerel, Fabien
Maillard, Odalric-Ambrym
Talebi, Mohammad Sadegh
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[9] Constrained Online Learning in Networks with Sublinear Regret and Fit
Paternain, Santiago
Lee, Soomin
Zavlanos, Michael M.
Ribeiro, Alejandro
2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5486 - 5493
[10] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
Krishnamurthy, Sanath Kumar
Zhan, Ruohan
Athey, Susan
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →