Online Boosting with Bandit Feedback

被引：0

作者：

Brukhim, Nataly ^{[1
,2
]}

Hazan, Elad ^{[1
,2
]}

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[2] Google AI, Princeton, NJ 08544 USA

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

ALGORITHMS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.

引用

页数：24

共 50 条

[1] Online Multiclass Boosting with Bandit Feedback
Zhang, Daniel T.
Jung, Young Hun
Tewari, Ambuj
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[2] Boosting with Online Binary Learners for the Multiclass Bandit Problem
Chen, Shang-Tse
Lin, Hsuan-Tien
Lu, Chi-Jen
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[3] Beyond Bandit Feedback in Online Multiclass Classification
van der Hoeven, Dirk
Fusco, Federico
Cesa-Bianchi, Nicole
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Online Spectral Learning on a Graph with Bandit Feedback
Gu, Quanquan
Han, Jiawei
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 833 - 838
[5] Multiclass Online Learnability under Bandit Feedback
Raman, Ananth
Raman, Vinod
Subedi, Unique
Mehalel, Idan
Tewari, Ambuj
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
[6] Mixtron: Bandit Online Multiclass Prediction with Implicit Feedback
Feng, Wanjin
Shi, Hailong
Zhao, Peilin
Gao, Xingyu
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1004 - 1012
[7] Online Learning With Incremental Feature Space and Bandit Feedback
Gu, Shilin
Luo, Tingjin
He, Ming
Hou, Chenping
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12902 - 12916
[8] Online Markov Decision Processes Under Bandit Feedback
Neu, Gergely
Gyoergy, Andras
Szepesvari, Csaba
Antos, Andras
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 676 - 691
[9] Online Stochastic Optimization under Correlated Bandit Feedback
Azar, Mohammad Gheshlaghi
Lazaric, Alessandro
Brunskill, Emma
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1557 - 1565
[10] Push-Sum Distributed Online Optimization With Bandit Feedback
Wang, Cong
Xu, Shengyuan
Yuan, Deming
Zhang, Baoyong
Zhang, Zhengqiang
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2263 - 2273

← 1 2 3 4 5 →