The Blinded Bandit: Learning with Adaptive Feedback

被引:0
|
作者
Dekel, Ofer [1 ]
Hazan, Elad [2 ]
Koren, Tomer [2 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Technion, Haifa, Israel
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of adaptive feedback naturally occurs in scenarios where the environment reacts to the player's actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, which we call the blinded multi-armed bandit, in which no feedback is given to the algorithm whenever it switches arms. We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. This result stands in stark contrast to another recent result, which states that adding a switching cost to the standard multi-armed bandit makes it substantially harder to learn, and provides a direct comparison of how feedback and loss contribute to the difficulty of an online learning problem. We also extend our results to the general prediction framework of bandit linear optimization, again attaining near-optimal regret bounds.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Learning to Rank in the Position Based Model with Bandit Feedback
    Ermis, Beyza
    Ernst, Patrick
    Stein, Yannik
    Zappella, Giovanni
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2405 - 2412
  • [22] Online Learning With Incremental Feature Space and Bandit Feedback
    Gu, Shilin
    Luo, Tingjin
    He, Ming
    Hou, Chenping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12902 - 12916
  • [23] Learning Multiclass Classifier Under Noisy Bandit Feedback
    Agarwal, Mudit
    Manwani, Naresh
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 448 - 460
  • [24] Multi-Feedback Bandit Learning with Probabilistic Contexts
    Yang, Luting
    Yang, Jianyi
    Ren, Shaolei
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3087 - 3093
  • [25] Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback
    Wang, Siwei
    Wang, Haoyun
    Huang, Longbo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10210 - 10217
  • [26] Bandit Online Learning on Graphs via Adaptive Optimization
    Yang, Peng
    Zhao, Peilin
    Gao, Xin
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2991 - 2997
  • [27] Adaptive quantized online distributed mirror descent algorithm with Bandit feedback
    Xie J.-R.
    Gao W.-H.
    Xie Y.-B.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2023, 40 (10): : 1774 - 1782
  • [28] Decentralized Nash Equilibria Learning for Online Game With Bandit Feedback
    Meng, Min
    Li, Xiuxian
    Chen, Jie
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (06) : 4050 - 4057
  • [29] Learning Structured Predictors from Bandit Feedback for Interactive NLP
    Sokolov, Artem
    Kreutzer, Julia
    Lo, Christopher
    Riezler, Stefan
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1610 - 1620
  • [30] Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
    Swaminathan, Adith
    Joachims, Thorsten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 814 - 823