Stochastic Conservative Contextual Linear Bandits

被引：0

作者：

Lin, Jiabin ^{[1
]}

Lee, Xian Yeow ^{[2
]}

Jubery, Talukder ^{[2
]}

Moothedath, Shana ^{[1
]}

Sarkar, Soumik ^{[2
]}

Ganapathysubramanian, Baskar ^{[2
]}

机构：

[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA

[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA

来源：

2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC) | 2022年

关键词：

D O I：

10.1109/CDC51059.2022.9993209

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.

引用

页码：7321 / 7326

页数：6

共 50 条

[31] Linear Bayes policy for learning in contextual-bandits
Antonio Martin H, Jose
Vargas, Ana M.
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7400 - 7406
[32] Interconnected Neural Linear Contextual Bandits with UCB Exploration
Chen, Yang
Xie, Miao
Liu, Jiamou
Zhao, Kaiqi
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 169 - 181
[33] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Li, Lihong
Lu, Yu
Zhou, Dengyong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[34] Meta-learning with Stochastic Linear Bandits
Cella, Leonardo
Lazaric, Alessandro
Pontil, Massimiliano
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[35] Feature Selection in Distributed Stochastic Linear Bandits
Lin, Jiabin
Moothedath, Shana
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 3939 - 3944
[36] Feature and Parameter Selection in Stochastic Linear Bandits
Moradipari, Ahmadreza
Turan, Berkay
Abbasi-Yadkori, Yasin
Alizadeh, Mahnoosh
Ghavamzadeh, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[37] Linear Stochastic Bandits Under Safety Constraints
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[38] Conservative Bandits
Wu, Yifan
Shariff, Roshan
Lattimore, Tor
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[39] Stochastic Linear Bandits Robust to Adversarial Attacks
Bogunovic, Ilija
Losalka, Arpan
Krause, Andreas
Scarlett, Jonathan
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[40] Meta-learning with Stochastic Linear Bandits
Cella, Leonardo
Lazaric, Alessandro
Pontil, Massimiliano
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,

← 1 2 3 4 5 →