Optimal cross-learning for contextual bandits with unknown context distributions

被引:0
|
作者
Schneider, Jon [1 ]
Zimmert, Julian [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of designing contextual bandit algorithms in the "cross-learning" setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of (O) over tilde(root TK), independent of the number of contexts. As a consequence, we obtain the first nearly tight regret bounds for the problems of learning to bid in first-price auctions (under unknown value distributions) and sleeping bandits with a stochastic action set. At the core of our algorithm is a novel technique for coordinating the execution of a learning algorithm over multiple epochs in such a way to remove correlations between estimation of the unknown distribution and the actions played by the algorithm. This technique may be of independent interest for other learning problems involving estimation of an unknown context distribution.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Context Enhancement for Linear Contextual Multi-Armed Bandits
    Gutowski, Nicolas
    Amghar, Tassadit
    Camp, Olivier
    Chhel, Fabien
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 1048 - 1055
  • [22] Multi-task Supervised Learning via Cross-learning
    Cervino, Juan
    Andres Bazerque, Juan
    Calvo-Fullana, Miguel
    Ribeiro, Alejandro
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1381 - 1385
  • [23] On the value of learning for Bernoulli bandits with unknown parameters
    Bhulai, S
    Koole, G
    PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 736 - 741
  • [24] Cross-learning on multiple databases in the case of acute appendicitis
    Podgorelec, V
    Zorman, M
    Kokol, P
    Eich, HP
    Ohmarm, C
    FOURTEENTH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2001, : 17 - 22
  • [25] On the value of learning for Bernoulli bandits with unknown parameters
    Bhulai, S
    Koole, G
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2000, 45 (11) : 2135 - 2140
  • [26] Online Learning in Bandits with Predicted Context
    Guo, Yongyi
    Xu, Ziping
    Murphy, Susan A.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [27] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [28] Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
    Saha, Aadirupa
    Krishnamurthy, Akshay
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [29] Cross-learning in analytic word recognition without segmentation
    Choisy C.
    Belaïd A.
    International Journal on Document Analysis and Recognition, 2002, 4 (4) : 281 - 289
  • [30] Financial hedging in energy market by cross-learning machines
    An-Sing Chen
    Mark T. Leung
    Shaotao Pan
    Ching-Yun Chou
    Neural Computing and Applications, 2020, 32 : 10321 - 10335