Preference-based online learning with dueling bandits: A survey

被引:0
|
作者
Bengs, Viktor [1 ]
Busa-Fekete, Robert [2 ]
Mesaoudi-Paul, Adil El [1 ]
Hullermeier, Eyke [1 ]
机构
[1] Heinz Nixdorf Institute, Department of Computer Science, Paderborn University, Germany
[2] Google Research, New York,NY, United States
关键词
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] Preference-based Online Learning with Dueling Bandits: A Survey
    Bengs, Viktor
    Busa-Fekete, Robert
    El Mesaoudi-Paul, Adil
    Huellermeier, Eyke
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [2] A Survey of Preference-Based Online Learning with Bandit Algorithms
    Busa-Fekete, Robert
    Huellermeier, Eyke
    ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 18 - 39
  • [3] Dueling Posterior Sampling for Preference-Based Reinforcement Learning
    Novoseller, Ellen R.
    Wei, Yibing
    Sui, Yanan
    Yue, Yisong
    Burdick, Joel W.
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1029 - 1038
  • [4] Contextual Bandits and Imitation Learning with Preference-Based Active Queries
    Sekhari, Ayush
    Sridharan, Karthik
    Sun, Wen
    Wu, Runzhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] A Survey of Preference-Based Reinforcement Learning Methods
    Wirth, Christian
    Akrour, Riad
    Neumann, Gerhard
    Fuernkranz, Johannes
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [7] Non-stationary Dueling Bandits for Online Learning to Rank
    Lu, Shiyin
    Miao, Yuan
    Yang, Ping
    Hu, Yao
    Zhang, Lijun
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
  • [8] Preference-based learning to rank
    Nir Ailon
    Mehryar Mohri
    Machine Learning, 2010, 80 : 189 - 211
  • [9] Preference-Based Policy Learning
    Akrour, Riad
    Schoenauer, Marc
    Sebag, Michele
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 12 - 27
  • [10] Preference-based learning to rank
    Ailon, Nir
    Mohri, Mehryar
    MACHINE LEARNING, 2010, 80 (2-3) : 189 - 211