Near-optimal Reinforcement Learning in Factored MDPs

被引：0

作者：

Osband, Ian ^{[1
]}

Van Roy, Benjamin ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014) | 2014年 / 27卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer Omega(root SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = Omega(SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

引用

页数：9

共 50 条

[41] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
Chen, Liyu
Luo, Haipeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[42] Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources
Cregg, Liam
Linder, Tamas
Yuksel, Serdar
IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (11) : 8399 - 8413
[43] Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning
Yu-Qing Qiu
Yan Li
Zhong Wang
International Journal of Control, Automation and Systems, 2023, 21 : 1319 - 1330
[44] MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems
Zhao, Dongbin
Zhu, Yuanheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (02) : 346 - 356
[45] R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
Brafman, RI
Tennenholtz, M
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) : 213 - 231
[46] Reinforcement learning for MDPs with constraints
Geibel, Peter
MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
[47] Kernel-based multiagent reinforcement learning for near-optimal formation control of mobile robots
Ronghua Zhang
Xin Xu
Xinglong Zhang
Quan Xiong
Qingwen Ma
Yaoqian Peng
Applied Intelligence, 2023, 53 : 12736 - 12748
[48] Near-optimal learning with average Holder smoothness
Hanneke, Steve
Kontorovich, Aryeh
Kornowski, Guy
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] A reinforcement learning-based near-optimal hierarchical approach for motion control: Design and experiment
Qin, Zhi-Chang
Zhu, Hai-Tao
Wang, Shou-Jun
Xin, Ying
Sun, Jian-Qiao
ISA TRANSACTIONS, 2022, 129 : 673 - 683
[50] Near-optimal interception strategy for orbital pursuit-evasion using deep reinforcement learning
Zhang, Jingrui
Zhang, Kunpeng
Zhang, Yao
Shi, Heng
Tang, Liang
Li, Mou
ACTA ASTRONAUTICA, 2022, 198 : 9 - 25

← 1 2 3 4 5 →