Online Boosting with Bandit Feedback

被引:0
|
作者
Brukhim, Nataly [1 ,2 ]
Hazan, Elad [1 ,2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Google AI, Princeton, NJ 08544 USA
来源
关键词
ALGORITHMS; OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback
    Wang, Cong
    Xu, Shengyuan
    Yuan, Deming
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 63 - 75
  • [22] On the Time-Varying Constraints and Bandit Feedback of Online Convex Optimization
    Cao, Xuanyu
    Liu, K. J. Ray
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [23] Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback
    Lin, Tianyi
    Pacchiano, Aldo
    Yu, Yaodong
    Jordan, Michael. I.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [24] Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
    Zhao, Boxin
    Wang, Lingxiao
    Liu, Ziqi
    Zhang, Zhiqiang
    Zhou, Jun
    Chen, Chaochao
    Kolar, Mladen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26 : 1 - 67
  • [25] Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback
    Zhang, Mingrui
    Chen, Lin
    Hassani, Hamed
    Karbasi, Amin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] Event-triggered distributed online convex optimization with delayed bandit feedback
    Xiong, Menghui
    Zhang, Baoyong
    Yuan, Deming
    Zhang, Yijun
    Chen, Jun
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 445
  • [27] Online Multiclass Learning with "Bandit" Feedback under a Confidence-Weighted Approach
    Shi, Chaoran
    Wang, Xiong
    Tian, Xiaohua
    Gan, Xiaoying
    Wang, Xinbing
    2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [28] Online bandit convex optimisation with stochastic constraints via two-point feedback
    Yu, Jichi
    Li, Jueyou
    Chen, Guo
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2089 - 2105
  • [29] Push-sum Distributed Dual Averaging Online Convex Optimization With Bandit Feedback
    Yang, Ju
    Wei, Mengli
    Wang, Yan
    Zhao, Zhongyuan
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2024, 22 (05) : 1461 - 1471
  • [30] Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
    Ba, Wenjia
    Lin, Tianyi
    Zhang, Jiawei
    Zhou, Zhengyuan
    OPERATIONS RESEARCH, 2025,