Near-optimal Reinforcement Learning in Factored MDPs

被引：0

作者：

Osband, Ian ^{[1
]}

Van Roy, Benjamin ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer Omega(root SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = Omega(SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

引用

页数：9

共 50 条

[31] Near-Optimal Vehicular Crowdsensing Task Allocation Empowered by Deep Reinforcement Learning
Xiang C.-C.
Li Y.-Y.
Feng L.
Chen C.
Guo S.-T.
Yang P.-L.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (05): : 918 - 934
[32] Safe Learning for Near-Optimal Scheduling
Busatto-Gaston, Damien
Chakraborty, Debraj
Guha, Shibashis
Perez, Guillermo A.
Raskin, Jean-Francois
QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2021), 2021, 12846 : 235 - 254
[33] Near-Optimal Collaborative Learning in Bandits
Reda, Clemence
Vakili, Sattar
Kaufmann, Emilie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[34] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
Mao, Weichao
Zhang, Kaiqing
Zhu, Ruihao
Simchi-Levi, David
Basar, Tamer
MANAGEMENT SCIENCE, 2024,
[35] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
Asiain, Erick
Clempner, Julio B.
Poznyak, Alexander S.
SOFT COMPUTING, 2019, 23 (11) : 3591 - 3604
[36] Application of a Near-Optimal Reinforcement Learning Controller to a Robotics Problem in Manufacturing: A Hybrid Approach
Warren E. Hearnes
Augustine O. Esogbue
Fuzzy Optimization and Decision Making, 2003, 2 (3) : 183 - 213
[37] Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature Control
Di Natale, Loris
Svetozarevic, Bratislav
Heer, Philipp
Jones, Colin N.
2022 IEEE 17TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA, 2022, : 698 - 703
[38] Faster near-optimal reinforcement learning:: Adding adaptiveness to the E3 algorithm
Domingo, C
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 1999, 1720 : 241 - 251
[39] Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning
Qiu, Yu-Qing
Li, Yan
Wang, Zhong
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (04) : 1319 - 1330
[40] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
Erick Asiain
Julio B. Clempner
Alexander S. Poznyak
Soft Computing, 2019, 23 : 3591 - 3604

← 1 2 3 4 5 →