Online Boosting with Bandit Feedback

被引:0
|
作者
Brukhim, Nataly [1 ,2 ]
Hazan, Elad [1 ,2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Google AI, Princeton, NJ 08544 USA
来源
关键词
ALGORITHMS; OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
    Long, Philip M.
    THEORETICAL COMPUTER SCIENCE, 2020, 808 : 159 - 163
  • [32] Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
    Wen, Zheng
    Kveton, Branislav
    Valko, Michal
    Vaswani, Sharan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [33] Bandit Learning with Implicit Feedback
    Qi, Yi
    Wu, Qingyun
    Wang, Hongning
    Tang, Jie
    Sun, Maosong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [34] Optimal Clustering with Bandit Feedback
    Yang, Junwen
    Zhong, Zixin
    Tan, Vincent Y. F.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [35] Nearest Neighbour with Bandit Feedback
    Pasteris, Stephen
    Hicks, Chris
    Mavroudis, Vasilios
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Structured Projection-free Online Convex Optimization with Multi-point Bandit Feedback
    Ding, Yuhao
    Lavaei, Javad
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 954 - 961
  • [37] Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
    Feng, Raymond
    Geneson, Jesse
    Lee, Andrew
    Slettnes, Espen
    THEORETICAL COMPUTER SCIENCE, 2023, 965
  • [38] Vector Optimization with Stochastic Bandit Feedback
    Ararat, Cagin
    Tekin, Cem
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [39] On Multilabel Classification and Ranking with Bandit Feedback
    Gentile, Claudio
    Orabona, Francesco
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 2451 - 2487
  • [40] Bandit Learning with Biased Human Feedback
    Tang, Wei
    Ho, Chien-Ju
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1324 - 1332