Stochastic Conservative Contextual Linear Bandits

被引：0

作者：

Lin, Jiabin ^{[1
]}

Lee, Xian Yeow ^{[2
]}

Jubery, Talukder ^{[2
]}

Moothedath, Shana ^{[1
]}

Sarkar, Soumik ^{[2
]}

Ganapathysubramanian, Baskar ^{[2
]}

机构：

[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA

[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA

来源：

2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC) | 2022年

关键词：

D O I：

10.1109/CDC51059.2022.9993209

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.

引用

页码：7321 / 7326

页数：6

共 50 条

[21] Federated Linear Contextual Bandits with Heterogeneous Clients
Blaser, Ethan
Li, Chuanhao
Wang, Hongning
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[22] Group Meritocratic Fairness in Linear Contextual Bandits
Grazzi, Riccardo
Akhavan, Arya
Falk, John Isak Texas
Cella, Leonardo
Pontil, Massimiliano
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[23] Linear Contextual Bandits with Hybrid Payoff: Revisited
Das, Nirjhar
Sinha, Gaurav
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK, PT VI, ECML PKDD 2024, 2024, 14946 : 441 - 455
[24] Smoothed Adversarial Linear Contextual Bandits with Knapsacks
Sivakumar, Vidyashankar
Zuo, Shiliang
Banerjee, Arindam
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[25] Leveraging Good Representations in Linear Contextual Bandits
Papini, Matteo
Tirinzoni, Andrea
Restelli, Marcello
Lazaric, Alessandro
Pirotta, Matteo
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[26] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
Ghosh, Avishek
Sankararaman, Abishek
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[27] Stage-wise Conservative Linear Bandits
Moradipari, Ahmadreza
Thrampoulidis, Christos
Alizadeh, Mahnoosh
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[28] Robust and private stochastic linear bandits
Charisopoulos, Vasileios
Esfandiari, Hossein
Mirrokni, Vahab
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[29] When Are Linear Stochastic Bandits Attackable?
Wang, Huazheng
Xu, Haifeng
Wang, Hongning
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[30] Privacy Amplification via Shuffling for Linear Contextual Bandits
Garcelon, Evrard
Chaudhuri, Kamalika
Perchet, Vianney
Pirotta, Matteo
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167

← 1 2 3 4 5 →