Near-optimal Reinforcement Learning in Factored MDPs

被引:0
|
作者
Osband, Ian [1 ]
Van Roy, Benjamin [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer Omega(root SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = Omega(SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Near-Optimal Vehicular Crowdsensing Task Allocation Empowered by Deep Reinforcement Learning
    Xiang C.-C.
    Li Y.-Y.
    Feng L.
    Chen C.
    Guo S.-T.
    Yang P.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (05): : 918 - 934
  • [32] Safe Learning for Near-Optimal Scheduling
    Busatto-Gaston, Damien
    Chakraborty, Debraj
    Guha, Shibashis
    Perez, Guillermo A.
    Raskin, Jean-Francois
    QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2021), 2021, 12846 : 235 - 254
  • [33] Near-Optimal Collaborative Learning in Bandits
    Reda, Clemence
    Vakili, Sattar
    Kaufmann, Emilie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [34] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    MANAGEMENT SCIENCE, 2024,
  • [35] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Asiain, Erick
    Clempner, Julio B.
    Poznyak, Alexander S.
    SOFT COMPUTING, 2019, 23 (11) : 3591 - 3604
  • [36] Application of a Near-Optimal Reinforcement Learning Controller to a Robotics Problem in Manufacturing: A Hybrid Approach
    Warren E. Hearnes
    Augustine O. Esogbue
    Fuzzy Optimization and Decision Making, 2003, 2 (3) : 183 - 213
  • [37] Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature Control
    Di Natale, Loris
    Svetozarevic, Bratislav
    Heer, Philipp
    Jones, Colin N.
    2022 IEEE 17TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA, 2022, : 698 - 703
  • [38] Faster near-optimal reinforcement learning:: Adding adaptiveness to the E3 algorithm
    Domingo, C
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 1999, 1720 : 241 - 251
  • [39] Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning
    Qiu, Yu-Qing
    Li, Yan
    Wang, Zhong
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (04) : 1319 - 1330
  • [40] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Erick Asiain
    Julio B. Clempner
    Alexander S. Poznyak
    Soft Computing, 2019, 23 : 3591 - 3604