A sparse sampling algorithm for near-optimal planning in large Markov decision processes

被引:0
|
作者
Kearns, M [1 ]
Mansour, Y [1 ]
Ng, AY [1 ]
机构
[1] AT&T Labs, Murray Hill, NJ 07974 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even infinite state spaces, traditional planning and reinforcement learning algorithms are often inapplicable, since their running time typically scales linearly with the state space size. In this paper we present a new algorithm that, given only a generative model (simulator) for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor gamma and the desired degree of approximation to the optimal policy), our results establish for the first time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN99].
引用
收藏
页码:1324 / 1331
页数:8
相关论文
共 50 条
  • [1] A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
    Michael Kearns
    Yishay Mansour
    Andrew Y. Ng
    Machine Learning, 2002, 49 : 193 - 208
  • [2] A sparse sampling algorithm for near-optimal planning in large Markov decision processes
    Kearns, M
    Mansour, Y
    Ng, AY
    MACHINE LEARNING, 2002, 49 (2-3) : 193 - 208
  • [3] DETERMINING NEAR-OPTIMAL POLICIES FOR MARKOV RENEWAL DECISION PROCESSES
    BOYSE, JW
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, MC 4 (02): : 215 - 217
  • [4] Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
    Xiong, Zhihan
    Shen, Ruoqi
    Cui, Qiwen
    Fazel, Maryam
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [5] Near-Optimal Entrywise Sampling of Numerically Sparse Matrices
    Braverman, Vladimir
    Krauthgamer, Robert
    Krishnan, Aditya
    Sapir, Shay
    CONFERENCE ON LEARNING THEORY, VOL 134, 2021, 134 : 759 - 773
  • [6] Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Yang, Lin F.
    Ye, Yinyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] A minimax near-optimal algorithm for adaptive rejection sampling
    Achddou, Juliette
    Lam-Weil, Joseph
    Carpentier, Alexandra
    Blanchard, Gilles
    ALGORITHMIC LEARNING THEORY, VOL 98, 2019, 98
  • [8] Sparse roadmap spanners for asymptotically near-optimal motion planning
    Dobson, Andrew
    Bekris, Kostas E.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2014, 33 (01): : 18 - 47
  • [9] Layered Gibbs Sampling Algorithm For Near-Optimal Detection in Large-MIMO Systems
    Mandloi, Manish
    Bhatia, Vimal
    2017 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2017,
  • [10] On a near-optimal and efficient algorithm for the sparse pooled data problem
    Hahn-klimroth, Max
    van der Hofstad, Remco
    Mueller, Noela
    Riddlesden, Connor
    BERNOULLI, 2025, 31 (02) : 1579 - 1605