共 50 条
- [2] A Survey of Preference-Based Online Learning with Bandit Algorithms ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 18 - 39
- [3] Dueling Posterior Sampling for Preference-Based Reinforcement Learning CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 1029 - 1038
- [4] Contextual Bandits and Imitation Learning with Preference-Based Active Queries ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [7] Non-stationary Dueling Bandits for Online Learning to Rank WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
- [9] Preference-Based Policy Learning MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 12 - 27