No-Regret Linear Bandits beyond Realizability

被引：0

作者：

Liu, Chong ^{[1
]}

Yin, Ming ^{[1
]}

Wang, Yu-Xiang ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA

来源：

UNCERTAINTY IN ARTIFICIAL INTELLIGENCE | 2023年 / 216卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter epsilon that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever epsilon > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical Lin-UCB algorithm - designed for the realizable case - is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal root T regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

引用

页码：1294 / 1303

页数：10

共 50 条

[31] Limits and limitations of no-regret learning in games
Monnot, Barnabe
Piliouras, Georgios
KNOWLEDGE ENGINEERING REVIEW, 2017, 32
[32] Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Tirinzoni, Andrea
Papini, Matteo
Touati, Ahmed
Lazaric, Alessandro
Pirotta, Matteo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
Li, Yingkai
Wang, Yining
Chen, Xi
Zhou, Yuan
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
[34] Optimistic No-regret Algorithms for Discrete Caching
Mhaisen N.
Sinha A.
Paschos G.
Iosifidis G.
Performance Evaluation Review, 2023, 51 (01): : 69 - 70
[35] Optimistic No-regret Algorithms for Discrete Caching
Mhaisen, Naram
Sinha, Abhishek
Paschos, Georgios
Iosifidis, George
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (03)
[36] On the convergence of no-regret learning in selfish routing
Krichene, Walid
Drighes, Benjamin
Bayen, Alexandre
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 163 - 171
[37] No-Regret Learning in Dynamic Stackelberg Games
Lauffer, Niklas
Ghasemi, Mahsa
Hashemi, Abolfazl
Savas, Yagiz
Topcu, Ufuk
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (03) : 1418 - 1431
[38] Acceleration through Optimistic No-Regret Dynamics
Wang, Jun-Kun
Abernethy, Jacob
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[39] Unifying convergence and no-regret in multiagent learning
Banerjee, Bikramjit
Peng, Jing
LEARNING AND ADAPTION IN MULTI-AGENT SYSTEMS, 2006, 3898 : 100 - 114
[40] Calibration and Internal No-Regret with Random Signals
Perchet, Vianney
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2009, 5809 : 68 - 82

← 1 2 3 4 5 →