Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits

被引:0
|
作者
Baudry, Dorian [1 ]
Pesquerel, Fabien [2 ]
Degenne, Remy [2 ]
Maillard, Odalric-Ambrym [2 ]
机构
[1] Ecole Polytech, CREST, Palaiseau, France
[2] Univ Lille, Cent Lille, CNRS, Inria,UMR 9189,CRIStAL, F-59000 Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.
引用
收藏
页数:46
相关论文
共 50 条
  • [1] ON ASYMPTOTICALLY OPTIMAL NON-PARAMETRIC CRITERIA
    BOROKOV, AA
    SYCHEVA, NM
    THEORY OF PROBILITY AND ITS APPLICATIONS,USSR, 1968, 13 (03): : 359 - &
  • [2] An asymptotically optimal test for a parametric set of regression functions against a non-parametric alternative
    Pouet, C
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2001, 98 (1-2) : 177 - 189
  • [3] Asymptotically optimal algorithms for budgeted multiple play bandits
    Luedtke, Alex
    Kaufmann, Emilie
    Chambaz, Antoine
    MACHINE LEARNING, 2019, 108 (11) : 1919 - 1949
  • [4] Asymptotically optimal algorithms for budgeted multiple play bandits
    Alex Luedtke
    Emilie Kaufmann
    Antoine Chambaz
    Machine Learning, 2019, 108 : 1919 - 1949
  • [5] Maximum Average Randomly Sampled: A Scale Free and Non-parametric Algorithm for Stochastic Bandits
    Khorasani, Masoud Moravej
    Weyer, Erik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits
    Combes, Richard
    Magureanu, Stefan
    Proutiere, Alexandre
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 152 - 152
  • [7] Optimal Algorithms for Stochastic Contextual Preference Bandits
    Saha, Aadirupa
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule
    Dohi, T
    Goseva-Popstojanova, K
    Trivedi, KS
    2000 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2000, : 77 - 84
  • [9] Multivariate and multi-scale generator based on non-parametric stochastic algorithms
    Markovic, Djurica
    Ilic, Sinisa
    Pavlovi, Dragutin
    Plavsic, Jasna
    Ilich, Nesa
    JOURNAL OF HYDROINFORMATICS, 2019, 21 (06) : 1102 - 1117
  • [10] Non-parametric monitoring of stochastic changes
    Burrell, A
    Papantoni-Kazakos, P
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS, AND INFORMATICS, VOL XVI, PROCEEDINGS, 2004, : 126 - 130