Stochastic Conservative Contextual Linear Bandits

被引：0

作者：

Lin, Jiabin ^{[1
]}

Lee, Xian Yeow ^{[2
]}

Jubery, Talukder ^{[2
]}

Moothedath, Shana ^{[1
]}

Sarkar, Soumik ^{[2
]}

Ganapathysubramanian, Baskar ^{[2
]}

机构：

[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA

[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA

来源：

2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC) | 2022年

关键词：

D O I：

10.1109/CDC51059.2022.9993209

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.

引用

页码：7321 / 7326

页数：6

共 50 条

[1] Conservative Contextual Linear Bandits
Kazerouni, Abbas
Ghavamzadeh, Mohammad
Abbasi-Yadkori, Yasin
Van Roy, Benjamin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[2] Stochastic Linear Contextual Bandits with Diverse Contexts
Wu, Weiqiang
Yang, Jing
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[3] Design of Experiments for Stochastic Contextual Linear Bandits
Zanette, Andrea
Dong, Kefan
Lee, Jonathan
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Learning in Generalized Linear Contextual Bandits with Stochastic Delays
Zhou, Zhengyuan
Xu, Renyuan
Blanchet, Jose
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
Bengs, Viktor
Saha, Aadirupa
Huellermeier, Eyke
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks
Ding, Qin
Hsieh, Cho-Jui
Sharpnack, James
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[7] Nonparametric Stochastic Contextual Bandits
Guan, Melody Y.
Jiang, Heinrich
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3119 - 3125
[8] Contextual Bandits with Stochastic Experts
Sen, Rajat
Shanmugam, Karthikeyan
Shakkottai, Sanjay
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[9] Balanced Linear Contextual Bandits
Dimakopoulou, Maria
Zhou, Zhengyuan
Athey, Susan
Imbens, Guido
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3445 - 3453
[10] Linear Contextual Bandits with Knapsacks
Agrawal, Shipra
Devanur, Nikhil R.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →