Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

被引:0
|
作者
Saha, Aadirupa [1 ]
Krishnamurthy, Akshay [1 ]
机构
[1] Microsoft Res, New York, NY 10012 USA
关键词
Contextual; Dueling Bandits; Preference-based learning; Realizability; Function approximation; Regret analysis; Best-Response; Policy regret; Regression oracles; Efficient; Optimal algorithms; Markov games; Linear realizability; Agnostic;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the K-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes preference-based feedback suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class F. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over F. This resolves an open problem of Dudik et al. (2015) on oracle efficient, regret-optimal algorithms for contextual dueling bandits.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Efficient and Robust High-Dimensional Linear Contextual Bandits
    Chen, Cheng
    Luo, Luo
    Zhang, Weinan
    Yu, Yong
    Lian, Yijiang
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4259 - 4265
  • [32] Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits
    Bibaut, Aurelien F.
    Chambaz, Antoine
    van der Laan, Mark J.
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1099 - 1108
  • [33] BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
    Rakhlin, Alexander
    Sridharan, Karthik
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [34] Asymptotically optimal algorithms for budgeted multiple play bandits
    Luedtke, Alex
    Kaufmann, Emilie
    Chambaz, Antoine
    MACHINE LEARNING, 2019, 108 (11) : 1919 - 1949
  • [35] Optimal Algorithms for Multiplayer Multi-Armed Bandits
    Wang, Po-An
    Proutiere, Alexandre
    Ariu, Kaito
    Jedra, Yassir
    Russo, Alessio
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [36] Optimal Streaming Algorithms for Multi-Armed Bandits
    Jin, Tianyuan
    Huang, Keke
    Tang, Jing
    Xiao, Xiaokui
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [37] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [38] STATISTICAL COMPLEXITY AND OPTIMAL ALGORITHMS FOR NONLINEAR RIDGE BANDITS
    Rajaraman, Nived
    Han, Yanjun
    Jiao, Jiantao
    Ramchandran, Kannan
    ANNALS OF STATISTICS, 2024, 52 (06): : 2557 - 2582
  • [39] Asymptotically optimal algorithms for budgeted multiple play bandits
    Alex Luedtke
    Emilie Kaufmann
    Antoine Chambaz
    Machine Learning, 2019, 108 : 1919 - 1949
  • [40] Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
    Liu, Haolin
    Wei, Chen-Yu
    Zimmert, Julian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,