Regret Bounds and Minimax Policies under Partial Monitoring

被引:0
|
作者
Audibert, Jean-Yves [1 ]
Bubeck, Sebastien [2 ]
机构
[1] Univ Paris Est, F-77455 Champs Sur Marne, France
[2] INRIA Lille, SequeL Project, F-59650 Villeneuve Dascq, France
关键词
Bandits (adversarial and stochastic); regret bound; minimax rate; label efficient; upper confidence bound (UCB) policy; online learning; prediction with limited feedback;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function psi for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for psi(x) = exp(eta x) + gamma/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with psi(x) = (eta/-x)(q) + gamma/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
引用
收藏
页码:2785 / 2836
页数:52
相关论文
共 50 条
  • [1] Regret bounds and minimax policies under partial monitoring
    Audibert, Jean-Yves
    Bubeck, Sébastien
    Journal of Machine Learning Research, 2010, 11 : 2785 - 2863
  • [2] A PDE approach for regret bounds under partial monitoring
    Bayraktar, Erhan
    Ekren, Ibrahim
    Zhang, Xin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [3] Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret
    Lattimore, Tor
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [4] An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
    Lattimore, Tor
    Szepesvari, Csaba
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [5] Minimax Regret Bounds for Reinforcement Learning
    Azar, Mohammad Gheshlaghi
    Osband, Ian
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] Partial Monitoring-Classification, Regret Bounds, and Algorithms
    Bartok, Gabor
    Foster, Dean P.
    Pal, David
    Rakhlin, Alexander
    Szepesvari, Csaba
    MATHEMATICS OF OPERATIONS RESEARCH, 2014, 39 (04) : 967 - 997
  • [7] Regret minimization under partial monitoring
    Cesa-Bianchi, Nicolo
    Lugosi, Gabor
    Stoltz, Gilles
    MATHEMATICS OF OPERATIONS RESEARCH, 2006, 31 (03) : 562 - 580
  • [8] Regret minimization under partial monitoring
    Cesa-Bianchi, Nicolo
    Lugosi, Gabor
    Stoltz, Gilles
    2006 IEEE INFORMATION THEORY WORKSHOP, 2006, : 72 - +
  • [9] Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance
    Bilodeau, Blair
    Foster, Dylan J.
    Roy, Daniel M.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [10] Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance
    Bilodeau, Blair
    Foster, Dylan J.
    Roy, Daniel M.
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,