Optimal cross-learning for contextual bandits with unknown context distributions

被引:0
|
作者
Schneider, Jon [1 ]
Zimmert, Julian [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of designing contextual bandit algorithms in the "cross-learning" setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of (O) over tilde(root TK), independent of the number of contexts. As a consequence, we obtain the first nearly tight regret bounds for the problems of learning to bid in first-price auctions (under unknown value distributions) and sleeping bandits with a stochastic action set. At the core of our algorithm is a novel technique for coordinating the execution of a learning algorithm over multiple epochs in such a way to remove correlations between estimation of the unknown distribution and the actions played by the algorithm. This technique may be of independent interest for other learning problems involving estimation of an unknown context distribution.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Contextual Bandits with Cross-Learning
    Balseiro, Santiago
    Golrezaei, Negin
    Mahdian, Mohammad
    Mirrokni, Vahab
    Schneider, Jon
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1607 - 1629
  • [2] Contextual Bandits With Cross-Learning
    Balseiro, Santiago
    Golrezaei, Negin
    Mahdian, Mohammad
    Mirrokni, Vahab
    Schneider, Jon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] CORRUPTED CONTEXTUAL BANDITS: ONLINE LEARNING WITH CORRUPTED CONTEXT
    Bouneffouf, Djallel
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3145 - 3149
  • [4] Nonparametric Contextual Bandits in an Unknown Metric Space
    Wanigasekara, Nirandika
    Yu, Christina Lee
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Stochastic Bandits with Context Distributions
    Kirschner, Johannes
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Context Attentive Bandits: Contextual Bandit with Restricted Context
    Bouneffouf, Djallel
    Rish, Irina
    Cecchi, Guillermo A.
    Feraud, Raphael
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1468 - 1475
  • [7] Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context
    Hanna, Osama A.
    Yang, Lin F.
    Fragouli, Christina
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandits Approach
    Vannella, Filippo
    Proutiere, Alexandre
    Jedra, Yassir
    Jeong, Jaeseong
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 12666 - 12679
  • [9] AdaLinUCB: Opportunistic Learning for Contextual Bandits
    Guo, Xueying
    Wang, Xiaoxiao
    Liu, Xin
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2420 - 2427
  • [10] Optimal Algorithms for Stochastic Contextual Preference Bandits
    Saha, Aadirupa
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34