An evolutionary random policy search algorithm for solving Markov decision processes

被引:6
|
作者
Hu, Jiaqiao [1 ]
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;
D O I
10.1287/ijoc.1050.0155
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [41] TURNPIKES IN FINITE MARKOV DECISION PROCESSES AND RANDOM WALK*
    Piunovskiy, A. B.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2023, 68 (01) : 123 - 149
  • [42] A distributed search system based on Markov Decision Processes
    Shen, YP
    Lee, DL
    Zhang, LW
    INTERNET APPLICATIONS, 1999, 1749 : 73 - 82
  • [43] Symbolic heuristic search for factored Markov decision processes
    Feng, ZZ
    Hansen, EA
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 455 - 460
  • [44] Efficient Policy Iteration for Periodic Markov Decision Processes
    Osogami, Takayuki
    Raymond, Rudy
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
  • [45] Solving Multiagent Markov Decision Processes: A Forest Management Example
    Chades, Iadine
    Bouteiller, Bertrand
    MODSIM 2005: INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING, 2005, : 1594 - 1600
  • [46] Representing and Solving Factored Markov Decision Processes with Imprecise Probabilities
    Delgado, Karina Valdivia
    de Barros, Leliane Nunes
    Cozman, Fabio Gagliardi
    Shirota, Ricardo
    ISIPTA '09: PROCEEDINGS OF THE SIXTH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITY: THEORIES AND APPLICATIONS, 2009, : 169 - +
  • [47] Solving very large weakly coupled Markov decision processes
    Meuleau, N
    Hauskrecht, M
    Kim, KE
    Peshkin, L
    Kaelbling, LP
    Dean, T
    Boutilier, C
    FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 165 - 172
  • [48] Policy iteration for robust nonstationary Markov decision processes
    Saumya Sinha
    Archis Ghate
    Optimization Letters, 2016, 10 : 1613 - 1628
  • [49] Policy Gradient for Rectangular Robust Markov Decision Processes
    Kumar, Navdeep
    Derman, Esther
    Geist, Matthieu
    Levy, Kfir
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Policy iteration for robust nonstationary Markov decision processes
    Sinha, Saumya
    Ghate, Archis
    OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628