The Blinded Bandit: Learning with Adaptive Feedback

被引:0
|
作者
Dekel, Ofer [1 ]
Hazan, Elad [2 ]
Koren, Tomer [2 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Technion, Haifa, Israel
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of adaptive feedback naturally occurs in scenarios where the environment reacts to the player's actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, which we call the blinded multi-armed bandit, in which no feedback is given to the algorithm whenever it switches arms. We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. This result stands in stark contrast to another recent result, which states that adding a switching cost to the standard multi-armed bandit makes it substantially harder to learn, and provides a direct comparison of how feedback and loss contribute to the difficulty of an online learning problem. We also extend our results to the general prediction framework of bandit linear optimization, again attaining near-optimal regret bounds.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
    Jin, Chi
    Jin, Tiancheng
    Luo, Haipeng
    Sra, Suvrit
    Yu, Tiancheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [42] Distributed No-Regret Learning in Aggregative Games With Residual Bandit Feedback
    Liu, Wenting
    Yi, Peng
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (04): : 1734 - 1745
  • [43] Meta-Scheduling for the Wireless Downlink through Learning with Bandit Feedback
    Song, Jianhan
    de Veciana, Gustavo
    Shakkottai, Sanjay
    2020 18TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT), 2020,
  • [44] Targeting Optimization for Internet Advertising by Learning from Logged Bandit Feedback
    Gasparini, Margherita
    Nuara, Alessandro
    Trovo, Francesco
    Gatti, Nicola
    Restelli, Marcello
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [45] Nonparametric adaptive learning with feedback
    Chen, XH
    White, H
    JOURNAL OF ECONOMIC THEORY, 1998, 82 (01) : 190 - 222
  • [46] Online Boosting with Bandit Feedback
    Brukhim, Nataly
    Hazan, Elad
    ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [47] Optimal Clustering with Bandit Feedback
    Yang, Junwen
    Zhong, Zixin
    Tan, Vincent Y. F.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [48] Nearest Neighbour with Bandit Feedback
    Pasteris, Stephen
    Hicks, Chris
    Mavroudis, Vasilios
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Adaptive estimation of random vectors with bandit feedback: A mean-squared error viewpoint
    Sen, Dipayan
    Prashanth, L. A.
    Gopalan, Aditya
    2023 NINTH INDIAN CONTROL CONFERENCE, ICC, 2023, : 180 - 181
  • [50] Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization
    Swaminathan, Adith
    Joachims, Thorsten
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1731 - 1755