An evolutionary random policy search algorithm for solving Markov decision processes

被引:6
|
作者
Hu, Jiaqiao [1 ]
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;
D O I
10.1287/ijoc.1050.0155
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [1] Evolutionary policy iteration for solving Markov decision processes
    Chang, HS
    Lee, HG
    Fu, MC
    Marcus, SI
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
  • [2] Random search for constrained Markov decision processes with multi-policy improvement
    Chang, Hyeong Soo
    AUTOMATICA, 2015, 58 : 127 - 130
  • [3] Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes
    Chen, HH
    Jafari, AA
    THIRTIETH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY (SSST), 1998, : 538 - 543
  • [4] An adaptive sampling algorithm for solving Markov decision processes
    Chang, HS
    Fu, MC
    Hu, JQ
    Marcus, SI
    OPERATIONS RESEARCH, 2005, 53 (01) : 126 - 139
  • [5] Markov Chain Analyses of Random Local Search and Evolutionary Algorithm
    Furutani, Hiroshi
    Tagami, Hiroki
    To, Ichihi
    Sakamoto, Makoto
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 153 - 156
  • [6] Markov Chain Analyses of Random Local Search and Evolutionary Algorithm
    Furutani, Hiroshi
    Tagami, Hiroki
    Sakamoto, Makoto
    Du, Yifei
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2014, 1 (03): : 220 - 224
  • [7] Parallelizing parallel rollout algorithm for solving Markov Decision Processes
    Kim, SW
    Chang, HS
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, 2003, 2716 : 122 - 136
  • [8] Approximate Newton methods for policy search in markov decision processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    Journal of Machine Learning Research, 2016, 17 : 1 - 51
  • [9] Approximate Newton Methods for Policy Search in Markov Decision Processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [10] An exact iterative search algorithm for constrained Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2014, 50 (05) : 1531 - 1534