Online Boosting with Bandit Feedback

被引:0
|
作者
Brukhim, Nataly [1 ,2 ]
Hazan, Elad [1 ,2 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[2] Google AI, Princeton, NJ 08544 USA
来源
关键词
ALGORITHMS; OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. This setting is motivated by applications in reinforcement learning, in which only partial feedback is provided to the learner. We give an efficient regret minimization method that has two implications. First, we describe an online boosting algorithm with noisy multi-point bandit feedback. Next, we give a new projection-free online convex optimization algorithm with stochastic gradient access, that improves state-of-the-art guarantees in terms of efficiency. Our analysis offers a novel way of incorporating stochastic gradient estimators within Frank-Wolfe-type methods, which circumvents the instability encountered when directly applying projection-free optimization to the stochastic setting.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Learning with Bandit Feedback in Potential Games
    Cohen, Johanne
    Heliou, Amelie
    Mertikopoulos, Panayotis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [42] Learning in Congestion Games with Bandit Feedback
    Cui, Qiwen
    Xiong, Zhihan
    Fazel, Maryam
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [43] Interactive Information Retrieval with Bandit Feedback
    Wang, Huazheng
    Jia, Yiling
    Wang, Hongning
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2658 - 2661
  • [44] The Blinded Bandit: Learning with Adaptive Feedback
    Dekel, Ofer
    Hazan, Elad
    Koren, Tomer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [45] Threshold Bandit, With and Without Censored Feedback
    Abernethy, Jacob
    Amin, Kareem
    Zhu, Ruihao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [46] Learning from eXtreme Bandit Feedback
    Lopez, Romain
    Dhillon, Inderjit S.
    Jordan, Michael, I
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8732 - 8740
  • [47] STOCHASTIC CONVEX OPTIMIZATION WITH BANDIT FEEDBACK
    Agarwal, Alekh
    Foster, Dean P.
    Hsu, Daniel
    Kakade, Sham M.
    Rakhlin, Alexander
    SIAM JOURNAL ON OPTIMIZATION, 2013, 23 (01) : 213 - 240
  • [48] Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback
    Li, Jueyou
    Gu, Chuanye
    Wu, Zhiyou
    Huang, Tingwen
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1009 - 1020
  • [49] Online Second Price Auction with Semi-Bandit Feedback under the Non-Stationary Setting
    Zhao, Haoyu
    Chen, Wei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6893 - 6900
  • [50] Bandit Online Optimization over the Permutahedron
    Ailon, Nir
    Hatano, Kohei
    Takimoto, Eiji
    ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 215 - 229