On the Minimax Regret for Online Learning with Feedback Graphs

被引:0
|
作者
Eldowa, Khaled [1 ]
Esposito, Emmanuel [1 ,2 ]
Cesari, Tommaso [3 ]
Cesa-Bianchi, Nicolo [1 ]
机构
[1] Univ Milan, Milan, Italy
[2] Ist Italiano Tecnol, Genoa, Italy
[3] Univ Ottawa, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O (root alpha T lnK), where K is the number of actions, alpha is the independence number of the graph, and T is the time horizon. The root lnK factor is known to be necessary when alpha = 1 (the experts case). On the other hand, when alpha = K (the bandits case), the minimax rate is known to be T (root KT ), and a lower bound O (root alpha T ) is known to hold for any a. Our improved upper bound O root alpha T(1 + ln(K/alpha))) holds for any a and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q is an element of [1/2, 1) that varies with alpha. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to timevarying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved Omega ( root alpha T(lnK)/(ln alpha)) lower bound for all alpha > 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as alpha < K.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Problem-dependent regret bounds for online learning with feedback graphs
    Hu, Bingshan
    Mehta, Nishant A.
    Pan, Jianping
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 852 - 861
  • [2] Online Learning with Feedback Graphs Without the Graphs
    Cohen, Alon
    Hazan, Tamir
    Koren, Tomer
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [3] Online Learning With Uncertain Feedback Graphs
    Ghari, Pouya M.
    Shen, Yanning
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9636 - 9650
  • [4] Minimax Regret Bounds for Reinforcement Learning
    Azar, Mohammad Gheshlaghi
    Osband, Ian
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [5] Learning on the Edge: Online Learning with Stochastic Feedback Graphs
    Esposito, Emmanuel
    Fusco, Federico
    van der Hoeven, Dirk
    Cesa-Bianchi, Nicolo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Online learning with feedback graphs and switching costs
    Rangi, Anshuka
    Franceschetti, Massimo
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [7] Online Learning with Sleeping Experts and Feedback Graphs
    Cortes, Corinna
    DeSalvo, Giulia
    Gentile, Claudio
    Mohri, Mehryar
    Yang, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] Online Learning with Dependent Stochastic Feedback Graphs
    Cortes, Corinna
    DeSalvo, Giulia
    Gentile, Claudio
    Mohri, Mehryar
    Zhang, Ningshan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [9] Online Learning with Dependent Stochastic Feedback Graphs
    Cortes, Corinna
    DeSalvo, Giulia
    Gentile, Claudio
    Mohri, Mehryar
    Zhang, Ningshan
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [10] Online Learning with Transductive Regret
    Mohri, Mehryar
    Yang, Scott
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30