Online Boosting with Bandit Feedback

被引：0

作者：

Brukhim, Nataly ^{[1
,2
]}

Hazan, Elad ^{[1
,2
]}

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[2] Google AI, Princeton, NJ 08544 USA

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

ALGORITHMS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.

引用

页数：24

共 50 条

[31] New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Long, Philip M.
THEORETICAL COMPUTER SCIENCE, 2020, 808 : 159 - 163
[32] Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
Wen, Zheng
Kveton, Branislav
Valko, Michal
Vaswani, Sharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[33] Bandit Learning with Implicit Feedback
Qi, Yi
Wu, Qingyun
Wang, Hongning
Tang, Jie
Sun, Maosong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[34] Optimal Clustering with Bandit Feedback
Yang, Junwen
Zhong, Zixin
Tan, Vincent Y. F.
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[35] Nearest Neighbour with Bandit Feedback
Pasteris, Stephen
Hicks, Chris
Mavroudis, Vasilios
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[36] Structured Projection-free Online Convex Optimization with Multi-point Bandit Feedback
Ding, Yuhao
Lavaei, Javad
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 954 - 961
[37] Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
Feng, Raymond
Geneson, Jesse
Lee, Andrew
Slettnes, Espen
THEORETICAL COMPUTER SCIENCE, 2023, 965
[38] Vector Optimization with Stochastic Bandit Feedback
Ararat, Cagin
Tekin, Cem
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
[39] On Multilabel Classification and Ranking with Bandit Feedback
Gentile, Claudio
Orabona, Francesco
JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 2451 - 2487
[40] Bandit Learning with Biased Human Feedback
Tang, Wei
Ho, Chien-Ju
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1324 - 1332

← 1 2 3 4 5 →