共 50 条
- [31] Online Learning in MDPs with Linear Function Approximation and Bandit Feedback ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [33] Adaptive Active Learning as a Multi-armed Bandit Problem 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
- [34] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
- [37] ONLINE LEARNING FOR COMPUTATION PEER OFFLOADING WITH SEMI-BANDIT FEEDBACK 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4524 - 4528
- [39] Simulating Bandit Learning from User Feedback for Extractive Question Answering PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5167 - 5179
- [40] Risk-Averse Trees for Learning from Logged Bandit Feedback 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983