The Blinded Bandit: Learning with Adaptive Feedback

被引：0

作者：

Dekel, Ofer ^{[1
]}

Hazan, Elad ^{[2
]}

Koren, Tomer ^{[2
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

[2] Technion, Haifa, Israel

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

关键词：

REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of adaptive feedback naturally occurs in scenarios where the environment reacts to the player's actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, which we call the blinded multi-armed bandit, in which no feedback is given to the algorithm whenever it switches arms. We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. This result stands in stark contrast to another recent result, which states that adding a switching cost to the standard multi-armed bandit makes it substantially harder to learn, and provides a direct comparison of how feedback and loss contribute to the difficulty of an online learning problem. We also extend our results to the general prediction framework of bandit linear optimization, again attaining near-optimal regret bounds.

引用

页数：9

共 50 条

[31] Online Learning in MDPs with Linear Function Approximation and Bandit Feedback
Neu, Gergely
Olkhovskaya, Julia
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[32] Modeling implicit feedback based on bandit learning for recommendation q
Yan, Cairong
Xian, Junli
Wan, Yongquan
Wang, Pengwei
NEUROCOMPUTING, 2021, 447 : 244 - 256
[33] Adaptive Active Learning as a Multi-armed Bandit Problem
Czarnecki, Wojciech M.
Podolak, Igor T.
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
[34] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
[35] Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection
Thornton, Charles E.
Buehrer, R. Michael
Martone, Anthony F.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2022, 58 (02) : 1133 - 1148
[36] A note on the price of bandit feedback for mistake-bounded online learning
Geneson, Jesse
THEORETICAL COMPUTER SCIENCE, 2021, 874 : 42 - 45
[37] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK
Zhu, Hongbin
Kang, Kai
Luo, Xiliang
Qian, Hua
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
[38] Meta-Scheduling for the Wireless Downlink Through Learning With Bandit Feedback
Song, Jianhan
de Veciana, Gustavo
Shakkottai, Sanjay
IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (02) : 487 - 500
[39] Simulating Bandit Learning from User Feedback for Extractive Question Answering
Gao, Ge
Choi, Eunsol
Artzi, Yoav
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5167 - 5179
[40] Risk-Averse Trees for Learning from Logged Bandit Feedback
Trovo, Francesco
Paladino, Stefano
Simone, Paolo
Restelli, Marcello
Gatti, Nicola
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983

← 1 2 3 4 5 →