An evolutionary random policy search algorithm for solving Markov decision processes

被引：6

作者：

Hu, Jiaqiao ^{[1
]}

Fu, Michael C.

Ramezani, Vahid R.

Marcus, Steven I.

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA

[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA

[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

INFORMS JOURNAL ON COMPUTING | 2007年 / 19卷 / 02期

关键词：

dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;

D O I：

10.1287/ijoc.1050.0155

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

引用

页码：161 / 174

页数：14

共 50 条

[21] An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
Ma, Yao
Zhao, Tingting
Hatano, Kohei
Sugiyama, Masashi
NEURAL COMPUTATION, 2016, 28 (03) : 563 - 593
[22] A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources
Marecki, Janusz
Koenig, Sven
Tambe, Milind
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2536 - 2541
[23] On-line search for Markov decision processes
Péret, Laurent
Garcia, Frederick
Revue d'Intelligence Artificielle, 2006, 20 (2-3) : 181 - 201
[24] A variable neighborhood search based algorithm for finite-horizon Markov Decision Processes
Zhao, Qiu Hong
Brimberg, Jack
Mladenovic, Nenad
APPLIED MATHEMATICS AND COMPUTATION, 2010, 217 (07) : 3480 - 3492
[25] Navigating to the Best Policy in Markov Decision Processes
Al Marjani, Aymen
Garivier, Aurelien
Proutiere, Alexandre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[26] Geometric Policy Iteration for Markov Decision Processes
Wu, Yue
De Loera, Jesus A.
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
[27] Efficient Policy Representation for Markov Decision Processes
Khademi, Anahita
Khademian, Sepehr
SMART TECHNOLOGIES IN URBAN ENGINEERING, STUE-2022, 2023, 536 : 151 - 162
[28] Policy gradient in Lipschitz Markov Decision Processes
Matteo Pirotta
Marcello Restelli
Luca Bascetta
Machine Learning, 2015, 100 : 255 - 283
[29] Policy gradient in Lipschitz Markov Decision Processes
Pirotta, Matteo
Restelli, Marcello
Bascetta, Luca
MACHINE LEARNING, 2015, 100 (2-3) : 255 - 283
[30] POLICY BOUNDS FOR MARKOV DECISION-PROCESSES
LOVEJOY, WS
OPERATIONS RESEARCH, 1986, 34 (04) : 630 - 637

← 1 2 3 4 5 →