The Blinded Bandit: Learning with Adaptive Feedback

被引:0
|
作者
Dekel, Ofer [1 ]
Hazan, Elad [2 ]
Koren, Tomer [2 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Technion, Haifa, Israel
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of adaptive feedback naturally occurs in scenarios where the environment reacts to the player's actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, which we call the blinded multi-armed bandit, in which no feedback is given to the algorithm whenever it switches arms. We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. This result stands in stark contrast to another recent result, which states that adding a switching cost to the standard multi-armed bandit makes it substantially harder to learn, and provides a direct comparison of how feedback and loss contribute to the difficulty of an online learning problem. We also extend our results to the general prediction framework of bandit linear optimization, again attaining near-optimal regret bounds.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
    Zhao, Boxin
    Wang, Lingxiao
    Liu, Ziqi
    Zhang, Zhiqiang
    Zhou, Jun
    Chen, Chaochao
    Kolar, Mladen
    JOURNAL OF MACHINE LEARNING RESEARCH, 2025, 26 : 1 - 67
  • [2] Bandit Learning with Implicit Feedback
    Qi, Yi
    Wu, Qingyun
    Wang, Hongning
    Tang, Jie
    Sun, Maosong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Bandit Learning with Biased Human Feedback
    Tang, Wei
    Ho, Chien-Ju
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1324 - 1332
  • [4] Learning with Bandit Feedback in Potential Games
    Cohen, Johanne
    Heliou, Amelie
    Mertikopoulos, Panayotis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [5] Learning in Congestion Games with Bandit Feedback
    Cui, Qiwen
    Xiong, Zhihan
    Fazel, Maryam
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Learning from eXtreme Bandit Feedback
    Lopez, Romain
    Dhillon, Inderjit S.
    Jordan, Michael, I
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8732 - 8740
  • [7] Efficient Counterfactual Learning from Bandit Feedback
    Narita, Yusuke
    Yasui, Shota
    Yata, Kohei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4634 - 4641
  • [8] Online Spectral Learning on a Graph with Bandit Feedback
    Gu, Quanquan
    Han, Jiawei
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 833 - 838
  • [9] Learning to Control Renewal Processes with Bandit Feedback
    Cayci S.
    Eryilmaz A.
    Srikant R.
    Performance Evaluation Review, 2019, 47 (01): : 41 - 42
  • [10] Multiclass classification with bandit feedback using adaptive regularization
    Koby Crammer
    Claudio Gentile
    Machine Learning, 2013, 90 : 347 - 383