An evolutionary random policy search algorithm for solving Markov decision processes

被引：6

作者：

Hu, Jiaqiao ^{[1
]}

Fu, Michael C.

Ramezani, Vahid R.

Marcus, Steven I.

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA

[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA

[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

INFORMS JOURNAL ON COMPUTING | 2007年 / 19卷 / 02期

关键词：

dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;

D O I：

10.1287/ijoc.1050.0155

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

引用

页码：161 / 174

页数：14

共 50 条

[1] Evolutionary policy iteration for solving Markov decision processes
Chang, HS
Lee, HG
Fu, MC
Marcus, SI
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2005, 50 (11) : 1804 - 1808
[2] Random search for constrained Markov decision processes with multi-policy improvement
Chang, Hyeong Soo
AUTOMATICA, 2015, 58 : 127 - 130
[3] Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes
Chen, HH
Jafari, AA
THIRTIETH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY (SSST), 1998, : 538 - 543
[4] An adaptive sampling algorithm for solving Markov decision processes
Chang, HS
Fu, MC
Hu, JQ
Marcus, SI
OPERATIONS RESEARCH, 2005, 53 (01) : 126 - 139
[5] Markov Chain Analyses of Random Local Search and Evolutionary Algorithm
Furutani, Hiroshi
Tagami, Hiroki
To, Ichihi
Sakamoto, Makoto
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 153 - 156
[6] Markov Chain Analyses of Random Local Search and Evolutionary Algorithm
Furutani, Hiroshi
Tagami, Hiroki
Sakamoto, Makoto
Du, Yifei
JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2014, 1 (03): : 220 - 224
[7] Parallelizing parallel rollout algorithm for solving Markov Decision Processes
Kim, SW
Chang, HS
OPENMP SHARED MEMORY PARALLEL PROGRAMMING, 2003, 2716 : 122 - 136
[8] Approximate Newton methods for policy search in markov decision processes
Furmston, Thomas
Lever, Guy
Barber, David
Journal of Machine Learning Research, 2016, 17 : 1 - 51
[9] Approximate Newton Methods for Policy Search in Markov Decision Processes
Furmston, Thomas
Lever, Guy
Barber, David
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[10] An exact iterative search algorithm for constrained Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (05) : 1531 - 1534

← 1 2 3 4 5 →