Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

被引:0
|
作者
Saha, Aadirupa [1 ]
Krishnamurthy, Akshay [1 ]
机构
[1] Microsoft Res, New York, NY 10012 USA
关键词
Contextual; Dueling Bandits; Preference-based learning; Realizability; Function approximation; Regret analysis; Best-Response; Policy regret; Regression oracles; Efficient; Optimal algorithms; Markov games; Linear realizability; Agnostic;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the K-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes preference-based feedback suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class F. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over F. This resolves an open problem of Dudik et al. (2015) on oracle efficient, regret-optimal algorithms for contextual dueling bandits.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Optimal cross-learning for contextual bandits with unknown context distributions
    Schneider, Jon
    Zimmert, Julian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks
    Ding, Qin
    Hsieh, Cho-Jui
    Sharpnack, James
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [43] Incentivizing Exploration in Linear Contextual Bandits under Information Gap
    Wang, Huazheng
    Xu, Haifeng
    Li, Chuanhao
    Liu, Zhiyuan
    Wang, Hongning
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 415 - 425
  • [44] Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts
    Park, Hongju
    Faradonbeh, Mohamad Kazem Shirani
    IFAC PAPERSONLINE, 2022, 55 (12): : 383 - 388
  • [45] Oracle-Efficient Pessimism: Offline Policy Optimization In Contextual Bandits
    Wang, Lequn
    Krishnamurthy, Akshay
    Slivkins, Aleksandrs
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [46] Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms
    Hanna, Osama A.
    Yang, Lin F.
    Fragouli, Christina
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [47] Universal and data-adaptive algorithms for model selection in linear contextual bandits
    Muthukumar, Vidya
    Krishnamurthy, Akshay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [48] Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces
    Zhu, Yinglun
    Mineiro, Paul
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [49] Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits
    Hashemi, Morteza
    Sabharwal, Ashutosh
    Koksal, C. Emre
    Shroff, Ness B.
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2018), 2018, : 2393 - 2401
  • [50] A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits
    He, Jiafan
    Wang, Tianhao
    Min, Yifei
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,