Stochastic Conservative Contextual Linear Bandits

被引:0
|
作者
Lin, Jiabin [1 ]
Lee, Xian Yeow [2 ]
Jubery, Talukder [2 ]
Moothedath, Shana [1 ]
Sarkar, Soumik [2 ]
Ganapathysubramanian, Baskar [2 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA
关键词
D O I
10.1109/CDC51059.2022.9993209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.
引用
收藏
页码:7321 / 7326
页数:6
相关论文
共 50 条
  • [31] Linear Bayes policy for learning in contextual-bandits
    Antonio Martin H, Jose
    Vargas, Ana M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7400 - 7406
  • [32] Interconnected Neural Linear Contextual Bandits with UCB Exploration
    Chen, Yang
    Xie, Miao
    Liu, Jiamou
    Zhao, Kaiqi
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 169 - 181
  • [33] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
    Li, Lihong
    Lu, Yu
    Zhou, Dengyong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [34] Meta-learning with Stochastic Linear Bandits
    Cella, Leonardo
    Lazaric, Alessandro
    Pontil, Massimiliano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [35] Feature Selection in Distributed Stochastic Linear Bandits
    Lin, Jiabin
    Moothedath, Shana
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 3939 - 3944
  • [36] Feature and Parameter Selection in Stochastic Linear Bandits
    Moradipari, Ahmadreza
    Turan, Berkay
    Abbasi-Yadkori, Yasin
    Alizadeh, Mahnoosh
    Ghavamzadeh, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [37] Linear Stochastic Bandits Under Safety Constraints
    Amani, Sanae
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [38] Conservative Bandits
    Wu, Yifan
    Shariff, Roshan
    Lattimore, Tor
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [39] Stochastic Linear Bandits Robust to Adversarial Attacks
    Bogunovic, Ilija
    Losalka, Arpan
    Krause, Andreas
    Scarlett, Jonathan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [40] Meta-learning with Stochastic Linear Bandits
    Cella, Leonardo
    Lazaric, Alessandro
    Pontil, Massimiliano
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,