Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits

被引:0
|
作者
Baudry, Dorian [1 ]
Pesquerel, Fabien [2 ]
Degenne, Remy [2 ]
Maillard, Odalric-Ambrym [2 ]
机构
[1] Ecole Polytech, CREST, Palaiseau, France
[2] Univ Lille, Cent Lille, CNRS, Inria,UMR 9189,CRIStAL, F-59000 Lille, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.
引用
收藏
页数:46
相关论文
共 50 条
  • [41] Optimal learning for sequential sampling with non-parametric beliefs
    Emre Barut
    Warren B. Powell
    Journal of Global Optimization, 2014, 58 : 517 - 543
  • [42] Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
    Shao, Han
    Yu, Xiaotian
    King, Irwin
    Lyu, Michael R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [43] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [44] Structure Adaptive Algorithms for Stochastic Bandits
    Degenne, Remy
    Shao, Han
    Koolen, Wouter M.
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [45] A non-parametric binarization method based on ensemble of clustering algorithms
    Bera, Suman Kumar
    Ghosh, Soulib
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (05) : 7653 - 7673
  • [46] A non-parametric binarization method based on ensemble of clustering algorithms
    Suman Kumar Bera
    Soulib Ghosh
    Showmik Bhowmik
    Ram Sarkar
    Mita Nasipuri
    Multimedia Tools and Applications, 2021, 80 : 7653 - 7673
  • [47] Structure Adaptive Algorithms for Stochastic Bandits
    Degenne, Remy
    Shao, Han
    Koolen, Wouter M.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [48] Non-Parametric Stochastic Sequential Assignment With Random Arrival Times
    Dervovic, Danial
    Hassanzadeh, Parisa
    Assefa, Samuel
    Reddy, Prashant
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4214 - 4220
  • [49] SOME ASPECTS OF THE NON-PARAMETRIC THEORY OF REPLACEMENT ALGORITHMS.
    Aven, O.I.
    Boguslavsky, L.B.
    Kogan, Ya.A.
    1975, : 225 - 231
  • [50] NON-PARAMETRIC LEARNING ALGORITHMS IN TIME-VARYING ENVIRONMENTS
    RUTKOWSKI, L
    SIGNAL PROCESSING, 1989, 18 (02) : 129 - 137