An evolutionary random policy search algorithm for solving Markov decision processes

被引:6
|
作者
Hu, Jiaqiao [1 ]
Fu, Michael C.
Ramezani, Vahid R.
Marcus, Steven I.
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Maryland, Robert H Smith Sch Business, College Pk, MD 20742 USA
[3] Univ Maryland, Syst Res Inst, College Pk, MD 20742 USA
[4] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
dynamic programming; Markov; finite state; analysis of algorithms; programming; nonlinear; queues;
D O I
10.1287/ijoc.1050.0155
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [21] An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
    Ma, Yao
    Zhao, Tingting
    Hatano, Kohei
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2016, 28 (03) : 563 - 593
  • [22] A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources
    Marecki, Janusz
    Koenig, Sven
    Tambe, Milind
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2536 - 2541
  • [23] On-line search for Markov decision processes
    Péret, Laurent
    Garcia, Frederick
    Revue d'Intelligence Artificielle, 2006, 20 (2-3) : 181 - 201
  • [24] A variable neighborhood search based algorithm for finite-horizon Markov Decision Processes
    Zhao, Qiu Hong
    Brimberg, Jack
    Mladenovic, Nenad
    APPLIED MATHEMATICS AND COMPUTATION, 2010, 217 (07) : 3480 - 3492
  • [25] Navigating to the Best Policy in Markov Decision Processes
    Al Marjani, Aymen
    Garivier, Aurelien
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [26] Geometric Policy Iteration for Markov Decision Processes
    Wu, Yue
    De Loera, Jesus A.
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2070 - 2078
  • [27] Efficient Policy Representation for Markov Decision Processes
    Khademi, Anahita
    Khademian, Sepehr
    SMART TECHNOLOGIES IN URBAN ENGINEERING, STUE-2022, 2023, 536 : 151 - 162
  • [28] Policy gradient in Lipschitz Markov Decision Processes
    Matteo Pirotta
    Marcello Restelli
    Luca Bascetta
    Machine Learning, 2015, 100 : 255 - 283
  • [29] Policy gradient in Lipschitz Markov Decision Processes
    Pirotta, Matteo
    Restelli, Marcello
    Bascetta, Luca
    MACHINE LEARNING, 2015, 100 (2-3) : 255 - 283
  • [30] POLICY BOUNDS FOR MARKOV DECISION-PROCESSES
    LOVEJOY, WS
    OPERATIONS RESEARCH, 1986, 34 (04) : 630 - 637