Regret Bounds and Minimax Policies under Partial Monitoring

被引:0
|
作者
Audibert, Jean-Yves [1 ]
Bubeck, Sebastien [2 ]
机构
[1] Univ Paris Est, F-77455 Champs Sur Marne, France
[2] INRIA Lille, SequeL Project, F-59650 Villeneuve Dascq, France
关键词
Bandits (adversarial and stochastic); regret bound; minimax rate; label efficient; upper confidence bound (UCB) policy; online learning; prediction with limited feedback;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function psi for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for psi(x) = exp(eta x) + gamma/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with psi(x) = (eta/-x)(q) + gamma/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
引用
收藏
页码:2785 / 2836
页数:52
相关论文
共 50 条
  • [41] Minimax regret priors for efficiency estimation
    Tsionas, Mike G.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 309 (03) : 1279 - 1285
  • [42] Minimax Regret Path Location on Trees
    Puerto, Justo
    Ricca, Federica
    Scozzari, Andrea
    NETWORKS, 2011, 58 (02) : 147 - 158
  • [43] Axioms for minimax regret choice correspondences
    Stoye, Joerg
    JOURNAL OF ECONOMIC THEORY, 2011, 146 (06) : 2226 - 2251
  • [44] Minimax Regret for Stochastic Shortest Path
    Cohen, Alon
    Efroni, Yonathan
    Mansour, Yishay
    Rosenberg, Aviv
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [45] Revenue management with minimax regret negotiations
    Ayvaz-Cavdaroglu, Nur
    Kachani, Soulaymane
    Maglaras, Costis
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2016, 63 : 12 - 22
  • [46] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
    Li, Junfan
    Liao, Shizhong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
  • [47] Minimax risk (regret) strategy for one class of control problems under dynamic disturbances
    D. A. Serkov
    Proceedings of the Steklov Institute of Mathematics, 2008, 263 : 202 - 211
  • [48] Optimization Under Severe Uncertainty: a Generalized Minimax Regret Approach for Problems with Linear Objectives
    Vu, Tuan-Anh
    Afifi, Sohaib
    Lefevre, Eric
    Pichon, Frederic
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 197 - 204
  • [49] Agricultural Water Management under Uncertainty Using Minimax Relative Regret Analysis Method
    Du, P.
    Li, Y. P.
    Huang, G. H.
    JOURNAL OF IRRIGATION AND DRAINAGE ENGINEERING, 2012, 138 (12) : 1033 - 1045
  • [50] Minimax risk (regret) strategy for one class of control problems under dynamic disturbances
    Serkov, D. A.
    TRUDY INSTITUTA MATEMATIKI I MEKHANIKI URO RAN, 2008, 14 (02): : 192 - 200