共 50 条
- [31] Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [32] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
- [33] Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [34] Distributed Lifelong Reinforcement Learning with Sub-Linear Regret 2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
- [35] Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
- [36] Optimistic posterior sampling for reinforcement learning: worst-case regret bounds ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
- [37] Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
- [40] Beyond No Regret: Instance-Dependent PAC Reinforcement Learning CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178 : 358 - 418