Regret Bounds and Minimax Policies under Partial Monitoring

被引:0
|
作者
Audibert, Jean-Yves [1 ]
Bubeck, Sebastien [2 ]
机构
[1] Univ Paris Est, F-77455 Champs Sur Marne, France
[2] INRIA Lille, SequeL Project, F-59650 Villeneuve Dascq, France
关键词
Bandits (adversarial and stochastic); regret bound; minimax rate; label efficient; upper confidence bound (UCB) policy; online learning; prediction with limited feedback;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function psi for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for psi(x) = exp(eta x) + gamma/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with psi(x) = (eta/-x)(q) + gamma/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
引用
收藏
页码:2785 / 2836
页数:52
相关论文
共 50 条
  • [31] Pessimistic, optimistic, and minimax regret approaches for linear programs under uncertainty
    Thipwiwatpotjana, Phantipa
    Lodwick, Weldon A.
    FUZZY OPTIMIZATION AND DECISION MAKING, 2014, 13 (02) : 151 - 171
  • [32] Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
    Efroni, Yonathan
    Merlis, Nadav
    Ghavamzadeh, Mohammad
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [33] A Maximum Likelihood Approach to Inference Under Coarse Data Based on Minimax Regret
    Guillaume, Romain
    Dubois, Didier
    UNCERTAINTY MODELLING IN DATA SCIENCE, 2019, 832 : 99 - 106
  • [34] Existence and stability of minimax regret equilibria
    Yang, Zhe
    Pu, Yong Jian
    JOURNAL OF GLOBAL OPTIMIZATION, 2012, 54 (01) : 17 - 26
  • [35] A Minimax Regret Approach to Robust Beamforming
    Byun, Jungsub
    Mutapcic, Almir
    Kim, Seung-Jean
    Cioffi, John M.
    2009 IEEE 70TH VEHICULAR TECHNOLOGY CONFERENCE FALL, VOLS 1-4, 2009, : 1531 - 1536
  • [36] MINIMAX REGRET APPLICABLE TO VOTING DECISIONS
    MAYER, LS
    GOOD, IJ
    AMERICAN POLITICAL SCIENCE REVIEW, 1975, 69 (03) : 916 - 917
  • [37] Asymptotically minimax regret by Bayes mixtures
    Takeuchi, J
    Barron, AR
    1998 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY - PROCEEDINGS, 1998, : 318 - 318
  • [38] Possibilistic Preference Elicitation by Minimax Regret
    Adam, Loic
    Destercke, Sebastien
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 161, 2021, 161 : 718 - 727
  • [39] Minimax regret estimation in linear models
    Eldar, YC
    Ben-Tal, A
    Nemirovski, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING SIGNAL PROCESSING THEORY AND METHODS, 2004, : 161 - 164
  • [40] Existence and stability of minimax regret equilibria
    Zhe Yang
    Yong Jian Pu
    Journal of Global Optimization, 2012, 54 : 17 - 26