An evolutionary random policy search algorithm for solving Markov decision processes

被引：6

作者：

Hu, Jiaqiao ^{[1
]}

Fu, Michael C.

Ramezani, Vahid R.

Marcus, Steven I.

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA

[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA

[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

INFORMS JOURNAL ON COMPUTING | 2007年 / 19卷 / 02期

关键词：

dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;

D O I：

10.1287/ijoc.1050.0155

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

引用

页码：161 / 174

页数：14

共 50 条

[41] TURNPIKES IN FINITE MARKOV DECISION PROCESSES AND RANDOM WALK*
Piunovskiy, A. B.
THEORY OF PROBABILITY AND ITS APPLICATIONS, 2023, 68 (01) : 123 - 149
[42] A distributed search system based on Markov Decision Processes
Shen, YP
Lee, DL
Zhang, LW
INTERNET APPLICATIONS, 1999, 1749 : 73 - 82
[43] Symbolic heuristic search for factored Markov decision processes
Feng, ZZ
Hansen, EA
EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 455 - 460
[44] Efficient Policy Iteration for Periodic Markov Decision Processes
Osogami, Takayuki
Raymond, Rudy
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1167 - 1172
[45] Solving Multiagent Markov Decision Processes: A Forest Management Example
Chades, Iadine
Bouteiller, Bertrand
MODSIM 2005: INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING: ADVANCES AND APPLICATIONS FOR MANAGEMENT AND DECISION MAKING, 2005, : 1594 - 1600
[46] Representing and Solving Factored Markov Decision Processes with Imprecise Probabilities
Delgado, Karina Valdivia
de Barros, Leliane Nunes
Cozman, Fabio Gagliardi
Shirota, Ricardo
ISIPTA '09: PROCEEDINGS OF THE SIXTH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITY: THEORIES AND APPLICATIONS, 2009, : 169 - +
[47] Solving very large weakly coupled Markov decision processes
Meuleau, N
Hauskrecht, M
Kim, KE
Peshkin, L
Kaelbling, LP
Dean, T
Boutilier, C
FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 165 - 172
[48] Policy iteration for robust nonstationary Markov decision processes
Saumya Sinha
Archis Ghate
Optimization Letters, 2016, 10 : 1613 - 1628
[49] Policy Gradient for Rectangular Robust Markov Decision Processes
Kumar, Navdeep
Derman, Esther
Geist, Matthieu
Levy, Kfir
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] Policy iteration for robust nonstationary Markov decision processes
Sinha, Saumya
Ghate, Archis
OPTIMIZATION LETTERS, 2016, 10 (08) : 1613 - 1628

← 1 2 3 4 5 →