Stochastic Conservative Contextual Linear Bandits

被引:0
|
作者
Lin, Jiabin [1 ]
Lee, Xian Yeow [2 ]
Jubery, Talukder [2 ]
Moothedath, Shana [1 ]
Sarkar, Soumik [2 ]
Ganapathysubramanian, Baskar [2 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
[2] Iowa State Univ, Dept Mech Engn, Ames, IA USA
关键词
D O I
10.1109/CDC51059.2022.9993209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, for instance when the context itself is a noisy measurement or a forecasting mechanism, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the Upper Confidence Bound (UCB) algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (iii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution is known. To validate the performance of our approach we perform numerical simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative.
引用
收藏
页码:7321 / 7326
页数:6
相关论文
共 50 条
  • [21] Federated Linear Contextual Bandits with Heterogeneous Clients
    Blaser, Ethan
    Li, Chuanhao
    Wang, Hongning
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [22] Group Meritocratic Fairness in Linear Contextual Bandits
    Grazzi, Riccardo
    Akhavan, Arya
    Falk, John Isak Texas
    Cella, Leonardo
    Pontil, Massimiliano
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] Linear Contextual Bandits with Hybrid Payoff: Revisited
    Das, Nirjhar
    Sinha, Gaurav
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK, PT VI, ECML PKDD 2024, 2024, 14946 : 441 - 455
  • [24] Smoothed Adversarial Linear Contextual Bandits with Knapsacks
    Sivakumar, Vidyashankar
    Zuo, Shiliang
    Banerjee, Arindam
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [25] Leveraging Good Representations in Linear Contextual Bandits
    Papini, Matteo
    Tirinzoni, Andrea
    Restelli, Marcello
    Lazaric, Alessandro
    Pirotta, Matteo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [26] Breaking the √T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
    Ghosh, Avishek
    Sankararaman, Abishek
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [27] Stage-wise Conservative Linear Bandits
    Moradipari, Ahmadreza
    Thrampoulidis, Christos
    Alizadeh, Mahnoosh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [28] Robust and private stochastic linear bandits
    Charisopoulos, Vasileios
    Esfandiari, Hossein
    Mirrokni, Vahab
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [29] When Are Linear Stochastic Bandits Attackable?
    Wang, Huazheng
    Xu, Haifeng
    Wang, Hongning
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Privacy Amplification via Shuffling for Linear Contextual Bandits
    Garcelon, Evrard
    Chaudhuri, Kamalika
    Perchet, Vianney
    Pirotta, Matteo
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167