Regret Bounds and Minimax Policies under Partial Monitoring

被引：0

作者：

Audibert, Jean-Yves ^{[1
]}

Bubeck, Sebastien ^{[2
]}

机构：

[1] Univ Paris Est, F-77455 Champs Sur Marne, France

[2] INRIA Lille, SequeL Project, F-59650 Villeneuve Dascq, France

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2010年 / 11卷

关键词：

Bandits (adversarial and stochastic); regret bound; minimax rate; label efficient; upper confidence bound (UCB) policy; online learning; prediction with limited feedback;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function psi for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for psi(x) = exp(eta x) + gamma/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with psi(x) = (eta/-x)(q) + gamma/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.

引用

页码：2785 / 2836

页数：52

共 50 条

[41] Minimax regret priors for efficiency estimation
Tsionas, Mike G.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 309 (03) : 1279 - 1285
[42] Minimax Regret Path Location on Trees
Puerto, Justo
Ricca, Federica
Scozzari, Andrea
NETWORKS, 2011, 58 (02) : 147 - 158
[43] Axioms for minimax regret choice correspondences
Stoye, Joerg
JOURNAL OF ECONOMIC THEORY, 2011, 146 (06) : 2226 - 2251
[44] Minimax Regret for Stochastic Shortest Path
Cohen, Alon
Efroni, Yonathan
Mansour, Yishay
Rosenberg, Aviv
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[45] Revenue management with minimax regret negotiations
Ayvaz-Cavdaroglu, Nur
Kachani, Soulaymane
Maglaras, Costis
OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2016, 63 : 12 - 22
[46] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
Li, Junfan
Liao, Shizhong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
[47] Minimax risk (regret) strategy for one class of control problems under dynamic disturbances
D. A. Serkov
Proceedings of the Steklov Institute of Mathematics, 2008, 263 : 202 - 211
[48] Optimization Under Severe Uncertainty: a Generalized Minimax Regret Approach for Problems with Linear Objectives
Vu, Tuan-Anh
Afifi, Sohaib
Lefevre, Eric
Pichon, Frederic
BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 197 - 204
[49] Agricultural Water Management under Uncertainty Using Minimax Relative Regret Analysis Method
Du, P.
Li, Y. P.
Huang, G. H.
JOURNAL OF IRRIGATION AND DRAINAGE ENGINEERING, 2012, 138 (12) : 1033 - 1045
[50] Minimax risk (regret) strategy for one class of control problems under dynamic disturbances
Serkov, D. A.
TRUDY INSTITUTA MATEMATIKI I MEKHANIKI URO RAN, 2008, 14 (02): : 192 - 200

← 1 2 3 4 5 →