Near-optimal Reinforcement Learning in Factored MDPs

被引:0
|
作者
Osband, Ian [1 ]
Van Roy, Benjamin [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer Omega(root SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = Omega(SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
    Chen, Liyu
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [42] Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources
    Cregg, Liam
    Linder, Tamas
    Yuksel, Serdar
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (11) : 8399 - 8413
  • [43] Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning
    Yu-Qing Qiu
    Yan Li
    Zhong Wang
    International Journal of Control, Automation and Systems, 2023, 21 : 1319 - 1330
  • [44] MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems
    Zhao, Dongbin
    Zhu, Yuanheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (02) : 346 - 356
  • [45] R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    Brafman, RI
    Tennenholtz, M
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) : 213 - 231
  • [46] Reinforcement learning for MDPs with constraints
    Geibel, Peter
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
  • [47] Kernel-based multiagent reinforcement learning for near-optimal formation control of mobile robots
    Ronghua Zhang
    Xin Xu
    Xinglong Zhang
    Quan Xiong
    Qingwen Ma
    Yaoqian Peng
    Applied Intelligence, 2023, 53 : 12736 - 12748
  • [48] Near-optimal learning with average Holder smoothness
    Hanneke, Steve
    Kontorovich, Aryeh
    Kornowski, Guy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] A reinforcement learning-based near-optimal hierarchical approach for motion control: Design and experiment
    Qin, Zhi-Chang
    Zhu, Hai-Tao
    Wang, Shou-Jun
    Xin, Ying
    Sun, Jian-Qiao
    ISA TRANSACTIONS, 2022, 129 : 673 - 683
  • [50] Near-optimal interception strategy for orbital pursuit-evasion using deep reinforcement learning
    Zhang, Jingrui
    Zhang, Kunpeng
    Zhang, Yao
    Shi, Heng
    Tang, Liang
    Li, Mou
    ACTA ASTRONAUTICA, 2022, 198 : 9 - 25