On the Minimax Regret for Online Learning with Feedback Graphs

被引:0
|
作者
Eldowa, Khaled [1 ]
Esposito, Emmanuel [1 ,2 ]
Cesari, Tommaso [3 ]
Cesa-Bianchi, Nicolo [1 ]
机构
[1] Univ Milan, Milan, Italy
[2] Ist Italiano Tecnol, Genoa, Italy
[3] Univ Ottawa, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O (root alpha T lnK), where K is the number of actions, alpha is the independence number of the graph, and T is the time horizon. The root lnK factor is known to be necessary when alpha = 1 (the experts case). On the other hand, when alpha = K (the bandits case), the minimax rate is known to be T (root KT ), and a lower bound O (root alpha T ) is known to hold for any a. Our improved upper bound O root alpha T(1 + ln(K/alpha))) holds for any a and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q is an element of [1/2, 1) that varies with alpha. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to timevarying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved Omega ( root alpha T(lnK)/(ln alpha)) lower bound for all alpha > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as alpha < K.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] An Online Learning Analysis of Minimax Adaptive Control
    Renganathan, Venkatraman
    Iannelli, Andrea
    Rantzer, Anders
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1034 - 1039
  • [32] A Reduction from Reinforcement Learning to No-Regret Online Learning
    Cheng, Ching-An
    des Combes, Remi Tachet
    Boots, Byron
    Gordon, Geoff
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3514 - 3523
  • [33] Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
    Di, Qiwei
    He, Jiafan
    Zhou, Dongruo
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [34] Reinforcement Learning with Feedback Graphs
    Dann, Christoph
    Mansour, Yishay
    Mohri, Mehryar
    Sekhari, Ayush
    Sridharan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [35] Existence and stability of minimax regret equilibria
    Yang, Zhe
    Pu, Yong Jian
    JOURNAL OF GLOBAL OPTIMIZATION, 2012, 54 (01) : 17 - 26
  • [36] A Minimax Regret Approach to Robust Beamforming
    Byun, Jungsub
    Mutapcic, Almir
    Kim, Seung-Jean
    Cioffi, John M.
    2009 IEEE 70TH VEHICULAR TECHNOLOGY CONFERENCE FALL, VOLS 1-4, 2009, : 1531 - 1536
  • [37] MINIMAX REGRET APPLICABLE TO VOTING DECISIONS
    MAYER, LS
    GOOD, IJ
    AMERICAN POLITICAL SCIENCE REVIEW, 1975, 69 (03) : 916 - 917
  • [38] Asymptotically minimax regret by Bayes mixtures
    Takeuchi, J
    Barron, AR
    1998 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY - PROCEEDINGS, 1998, : 318 - 318
  • [39] Possibilistic Preference Elicitation by Minimax Regret
    Adam, Loic
    Destercke, Sebastien
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 161, 2021, 161 : 718 - 727
  • [40] Minimax regret estimation in linear models
    Eldar, YC
    Ben-Tal, A
    Nemirovski, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING SIGNAL PROCESSING THEORY AND METHODS, 2004, : 161 - 164