Online Boosting with Bandit Feedback

被引：0

作者：

Brukhim, Nataly ^{[1
,2
]}

Hazan, Elad ^{[1
,2
]}

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[2] Google AI, Princeton, NJ 08544 USA

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

ALGORITHMS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.

引用

页数：24

共 50 条

[21] Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback
Wang, Cong
Xu, Shengyuan
Yuan, Deming
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 63 - 75
[22] On the Time-Varying Constraints and Bandit Feedback of Online Convex Optimization
Cao, Xuanyu
Liu, K. J. Ray
2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
[23] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
Lin, Tianyi
Pacchiano, Aldo
Yu, Yaodong
Jordan, Michael. I.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[24] Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
Zhao, Boxin
Wang, Lingxiao
Liu, Ziqi
Zhang, Zhiqiang
Zhou, Jun
Chen, Chaochao
Kolar, Mladen
JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26 : 1 - 67
[25] Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback
Zhang, Mingrui
Chen, Lin
Hassani, Hamed
Karbasi, Amin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[26] Event-triggered distributed online convex optimization with delayed bandit feedback
Xiong, Menghui
Zhang, Baoyong
Yuan, Deming
Zhang, Yijun
Chen, Jun
APPLIED MATHEMATICS AND COMPUTATION, 2023, 445
[27] Online Multiclass Learning with "Bandit" Feedback under a Confidence-Weighted Approach
Shi, Chaoran
Wang, Xiong
Tian, Xiaohua
Gan, Xiaoying
Wang, Xinbing
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[28] Online bandit convex optimisation with stochastic constraints via two-point feedback
Yu, Jichi
Li, Jueyou
Chen, Guo
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2089 - 2105
[29] Push-sum Distributed Dual Averaging Online Convex Optimization With Bandit Feedback
Yang, Ju
Wei, Mengli
Wang, Yan
Zhao, Zhongyuan
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2024, 22 (05) : 1461 - 1471
[30] Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
Ba, Wenjia
Lin, Tianyi
Zhang, Jiawei
Zhou, Zhengyuan
OPERATIONS RESEARCH, 2025,

← 1 2 3 4 5 →