Stochastic Conservative Contextual Linear Bandits

被引：0

作者：

Lin, Jiabin ^{[1
]}

Lee, Xian Yeow ^{[2
]}

Jubery, Talukder ^{[2
]}

Moothedath, Shana ^{[1
]}

Sarkar, Soumik ^{[2
]}

Ganapathysubramanian, Baskar ^{[2
]}

机构：

[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA

[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA

来源：

2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC) | 2022年

关键词：

D O I：

10.1109/CDC51059.2022.9993209

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.

引用

页码：7321 / 7326

页数：6

共 50 条

[11] Federated Linear Contextual Bandits
Huang, Ruiquan
Wu, Weiqiang
Yang, Jing
Shen, Cong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[12] Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms
Hanna, Osama A.
Yang, Lin F.
Fragouli, Christina
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[13] Adversarial Attacks on Linear Contextual Bandits
Garcelon, Evrard
Roziere, Baptiste
Meunier, Laurent
Tarbouriech, Jean
Teytaud, Olivier
Lazaric, Alessandro
Pirotta, Matteo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[14] Shuffle Private Linear Contextual Bandits
Chowdhury, Sayak Ray
Zhou, Xingyu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[15] Differentially Private Contextual Linear Bandits
Shariff, Roshan
Sheffet, Or
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[16] An Efficient Algorithm for Deep Stochastic Contextual Bandits
Zhu, Tan
Liang, Guannan
Zhu, Chunjiang
Li, Haining
Bi, Jinbo
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11193 - 11201
[17] Optimal Algorithms for Stochastic Contextual Preference Bandits
Saha, Aadirupa
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[18] Stochastic Contextual Bandits with Long Horizon Rewards
Qin, Yuzhen
Li, Yingcong
Pasqualetti, Fabio
Fazel, Maryam
Oymak, Samet
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9525 - 9533
[19] Safe Linear Stochastic Bandits
Khezeli, Kia
Bitar, Eilyan
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10202 - 10209
[20] Stochastic Bandits with Linear Constraints
Pacchiano, Aldo
Ghavamzadeh, Mohammad
Bartlett, Peter
Jiang, Heinrich
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130

← 1 2 3 4 5 →