An evolutionary random policy search algorithm for solving Markov decision processes

被引：6

作者：

Hu, Jiaqiao ^{[1
]}

Fu, Michael C.

Ramezani, Vahid R.

Marcus, Steven I.

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA

[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA

[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

INFORMS JOURNAL ON COMPUTING | 2007年 / 19卷 / 02期

关键词：

dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;

D O I：

10.1287/ijoc.1050.0155

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.

引用

页码：161 / 174

页数：14

共 50 条

[31] Policy set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2013, 49 (12) : 3687 - 3689
[32] Solving Markov Decision Processes with Downside Risk Adjustment
Abhijit Gosavi
Anish Parulekar
International Journal of Automation and Computing, 2016, 13 (03) : 235 - 245
[33] Solving Markov decision processes with downside risk adjustment
Gosavi A.
Parulekar A.
International Journal of Automation and Computing, 2016, 13 (3) : 235 - 245
[34] Solving Markov Decision Processes with Partial State Abstractions
Nashed, Samer B.
Svegliato, Justin
Brucato, Matteo
Basich, Connor
Grupen, Rod
Zilberstein, Shlomo
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 813 - 819
[35] Solving transition independent decentralized Markov decision processes
Becker, R
Zilberstein, S
Lesser, V
Goldman, CV
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 423 - 455
[36] Solving transition independent decentralized Markov decision processes
Becker, Raphen
Zilberstein, Shlomo
Lesser, Victor
Goldman, Claudia V.
Journal of Artificial Intelligence Research, 1600, 22 : 423 - 455
[37] MDPFuzz: Testing Models Solving Markov Decision Processes
Pang, Qi
Yuan, Yuanyuan
Wang, Shuai
PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 378 - 390
[38] The policy iteration algorithm for average reward Markov decision processes with general state space
Meyn, SP
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
[39] Random Markov decision processes for sustainable infrastructure systems
Meidani, Hadi
Ghanem, Roger
STRUCTURE AND INFRASTRUCTURE ENGINEERING, 2015, 11 (05) : 655 - 667
[40] APPLICATION OF MARKOV DECISION-PROCESSES TO SEARCH PROBLEMS
HARTMAN, LB
VANHEE, KM
DECISION SUPPORT SYSTEMS, 1995, 14 (03) : 283 - 298

← 1 2 3 4 5 →