An evolutionary random policy search algorithm for solving Markov decision processes

被引:6
|
作者
Hu, Jiaqiao [1 ]
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;
D O I
10.1287/ijoc.1050.0155
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [31] Policy set iteration for Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2013, 49 (12) : 3687 - 3689
  • [32] Solving Markov Decision Processes with Downside Risk Adjustment
    Abhijit Gosavi
    Anish Parulekar
    International Journal of Automation and Computing, 2016, 13 (03) : 235 - 245
  • [33] Solving Markov decision processes with downside risk adjustment
    Gosavi A.
    Parulekar A.
    International Journal of Automation and Computing, 2016, 13 (3) : 235 - 245
  • [34] Solving Markov Decision Processes with Partial State Abstractions
    Nashed, Samer B.
    Svegliato, Justin
    Brucato, Matteo
    Basich, Connor
    Grupen, Rod
    Zilberstein, Shlomo
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 813 - 819
  • [35] Solving transition independent decentralized Markov decision processes
    Becker, R
    Zilberstein, S
    Lesser, V
    Goldman, CV
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 423 - 455
  • [36] Solving transition independent decentralized Markov decision processes
    Becker, Raphen
    Zilberstein, Shlomo
    Lesser, Victor
    Goldman, Claudia V.
    Journal of Artificial Intelligence Research, 1600, 22 : 423 - 455
  • [37] MDPFuzz: Testing Models Solving Markov Decision Processes
    Pang, Qi
    Yuan, Yuanyuan
    Wang, Shuai
    PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 378 - 390
  • [38] The policy iteration algorithm for average reward Markov decision processes with general state space
    Meyn, SP
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680
  • [39] Random Markov decision processes for sustainable infrastructure systems
    Meidani, Hadi
    Ghanem, Roger
    STRUCTURE AND INFRASTRUCTURE ENGINEERING, 2015, 11 (05) : 655 - 667
  • [40] APPLICATION OF MARKOV DECISION-PROCESSES TO SEARCH PROBLEMS
    HARTMAN, LB
    VANHEE, KM
    DECISION SUPPORT SYSTEMS, 1995, 14 (03) : 283 - 298