Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

被引:0
|
作者
Saha, Aadirupa [1 ]
Krishnamurthy, Akshay [1 ]
机构
[1] Microsoft Res, New York, NY 10012 USA
关键词
Contextual; Dueling Bandits; Preference-based learning; Realizability; Function approximation; Regret analysis; Best-Response; Policy regret; Regression oracles; Efficient; Optimal algorithms; Markov games; Linear realizability; Agnostic;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the K-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes preference-based feedback suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class F. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over F. This resolves an open problem of Dudik et al. (2015) on oracle efficient, regret-optimal algorithms for contextual dueling bandits.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles
    Han, Yuxuan
    Zeng, Jialin
    Wang, Yang
    Xiang, Yang
    Zhang, Jiheng
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [2] Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability
    Simchi-Levi, David
    Xu, Yunzong
    MATHEMATICS OF OPERATIONS RESEARCH, 2021, 47 (03) : 1 - 28
  • [3] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
    Saha, Aadirupa
    Gupta, Shubham
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
  • [4] Tractable contextual bandits beyond realizability
    Krishnamurthy, Sanath Kumar
    Hadad, Vitor
    Athey, Susan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [5] Optimal Algorithms for Stochastic Contextual Preference Bandits
    Saha, Aadirupa
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
    Bengs, Viktor
    Saha, Aadirupa
    Huellermeier, Eyke
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Instance-optimal PAC Algorithms for Contextual Bandits
    Li, Zhaoqi
    Ratliff, Lillian
    Nassif, Houssam
    Jamieson, Kevin
    Jain, Lalit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Provably Optimal Algorithms for Generalized Linear Contextual Bandits
    Li, Lihong
    Lu, Yu
    Zhou, Dengyong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [9] Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
    He, Jiafan
    Zhou, Dongruo
    Zhang, Tong
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Jointly Efficient and Optimal Algorithms for Logistic Bandits
    Faury, Louis
    Abeille, Marc
    Jun, Kwang-Sung
    Calauzenes, Clement
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 546 - 580