Online Boosting with Bandit Feedback

被引:0
|
作者
Brukhim, Nataly [1 ,2 ]
Hazan, Elad [1 ,2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Google AI, Princeton, NJ 08544 USA
来源
关键词
ALGORITHMS; OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Online Multiclass Boosting with Bandit Feedback
    Zhang, Daniel T.
    Jung, Young Hun
    Tewari, Ambuj
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [2] Boosting with Online Binary Learners for the Multiclass Bandit Problem
    Chen, Shang-Tse
    Lin, Hsuan-Tien
    Lu, Chi-Jen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [3] Beyond Bandit Feedback in Online Multiclass Classification
    van der Hoeven, Dirk
    Fusco, Federico
    Cesa-Bianchi, Nicole
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Online Spectral Learning on a Graph with Bandit Feedback
    Gu, Quanquan
    Han, Jiawei
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 833 - 838
  • [5] Multiclass Online Learnability under Bandit Feedback
    Raman, Ananth
    Raman, Vinod
    Subedi, Unique
    Mehalel, Idan
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [6] Mixtron: Bandit Online Multiclass Prediction with Implicit Feedback
    Feng, Wanjin
    Shi, Hailong
    Zhao, Peilin
    Gao, Xingyu
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1004 - 1012
  • [7] Online Learning With Incremental Feature Space and Bandit Feedback
    Gu, Shilin
    Luo, Tingjin
    He, Ming
    Hou, Chenping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12902 - 12916
  • [8] Online Markov Decision Processes Under Bandit Feedback
    Neu, Gergely
    Gyoergy, Andras
    Szepesvari, Csaba
    Antos, Andras
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 676 - 691
  • [9] Online Stochastic Optimization under Correlated Bandit Feedback
    Azar, Mohammad Gheshlaghi
    Lazaric, Alessandro
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1557 - 1565
  • [10] Push-Sum Distributed Online Optimization With Bandit Feedback
    Wang, Cong
    Xu, Shengyuan
    Yuan, Deming
    Zhang, Baoyong
    Zhang, Zhengqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2263 - 2273