No-Regret Linear Bandits beyond Realizability

被引:0
|
作者
Liu, Chong [1 ]
Yin, Ming [1 ]
Wang, Yu-Xiang [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.
引用
收藏
页码:1294 / 1303
页数:10
相关论文
共 50 条
  • [31] Limits and limitations of no-regret learning in games
    Monnot, Barnabe
    Piliouras, Georgios
    KNOWLEDGE ENGINEERING REVIEW, 2017, 32
  • [32] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
    Tirinzoni, Andrea
    Papini, Matteo
    Touati, Ahmed
    Lazaric, Alessandro
    Pirotta, Matteo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
    Li, Yingkai
    Wang, Yining
    Chen, Xi
    Zhou, Yuan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
  • [34] Optimistic No-regret Algorithms for Discrete Caching
    Mhaisen N.
    Sinha A.
    Paschos G.
    Iosifidis G.
    Performance Evaluation Review, 2023, 51 (01): : 69 - 70
  • [35] Optimistic No-regret Algorithms for Discrete Caching
    Mhaisen, Naram
    Sinha, Abhishek
    Paschos, Georgios
    Iosifidis, George
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (03)
  • [36] On the convergence of no-regret learning in selfish routing
    Krichene, Walid
    Drighes, Benjamin
    Bayen, Alexandre
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 163 - 171
  • [37] No-Regret Learning in Dynamic Stackelberg Games
    Lauffer, Niklas
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Savas, Yagiz
    Topcu, Ufuk
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (03) : 1418 - 1431
  • [38] Acceleration through Optimistic No-Regret Dynamics
    Wang, Jun-Kun
    Abernethy, Jacob
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [39] Unifying convergence and no-regret in multiagent learning
    Banerjee, Bikramjit
    Peng, Jing
    LEARNING AND ADAPTION IN MULTI-AGENT SYSTEMS, 2006, 3898 : 100 - 114
  • [40] Calibration and Internal No-Regret with Random Signals
    Perchet, Vianney
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2009, 5809 : 68 - 82