Online Boosting with Bandit Feedback

被引：0

作者：

Brukhim, Nataly ^{[1
,2
]}

Hazan, Elad ^{[1
,2
]}

机构：

[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA

[2] Google AI, Princeton, NJ 08544 USA

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

ALGORITHMS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.

引用

页数：24

共 50 条

[41] Learning with Bandit Feedback in Potential Games
Cohen, Johanne
Heliou, Amelie
Mertikopoulos, Panayotis
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[42] Learning in Congestion Games with Bandit Feedback
Cui, Qiwen
Xiong, Zhihan
Fazel, Maryam
Du, Simon S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[43] Interactive Information Retrieval with Bandit Feedback
Wang, Huazheng
Jia, Yiling
Wang, Hongning
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2658 - 2661
[44] The Blinded Bandit: Learning with Adaptive Feedback
Dekel, Ofer
Hazan, Elad
Koren, Tomer
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[45] Threshold Bandit, With and Without Censored Feedback
Abernethy, Jacob
Amin, Kareem
Zhu, Ruihao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[46] Learning from eXtreme Bandit Feedback
Lopez, Romain
Dhillon, Inderjit S.
Jordan, Michael, I
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8732 - 8740
[47] STOCHASTIC CONVEX OPTIMIZATION WITH BANDIT FEEDBACK
Agarwal, Alekh
Foster, Dean P.
Hsu, Daniel
Kakade, Sham M.
Rakhlin, Alexander
SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (01) : 213 - 240
[48] Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback
Li, Jueyou
Gu, Chuanye
Wu, Zhiyou
Huang, Tingwen
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1009 - 1020
[49] Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting
Zhao, Haoyu
Chen, Wei
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6893 - 6900
[50] Bandit Online Optimization over the Permutahedron
Ailon, Nir
Hatano, Kohei
Takimoto, Eiji
ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 215 - 229

← 1 2 3 4 5 →