Approximate receding horizon approach for Markov decision processes: average reward case

被引:20
|
作者
Chang, HS
Marcus, SI [1 ]
机构
[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
[2] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
Markov decision process; receding horizon control; infinite-horizon average reward; policy improvement; rollout; ergodicity;
D O I
10.1016/S0022-247X(03)00506-7
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call "approximate receding horizon control." We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. (C) 2003 Elsevier Inc. All rights reserved.
引用
收藏
页码:636 / 651
页数:16
相关论文
共 50 条
  • [1] Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
    Chen, Liyu
    Jain, Rahul
    Luo, Haipeng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
    Bai, Qinbo
    Mondal, Washim Uddin
    Aggarwal, Vaneet
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 10980 - 10988
  • [4] Average-Reward Decentralized Markov Decision Processes
    Petrik, Marek
    Zilberstein, Shlomo
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1997 - 2002
  • [5] Robust Average-Reward Markov Decision Processes
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15215 - 15223
  • [6] A UNIFIED APPROACH TO ADAPTIVE-CONTROL OF AVERAGE REWARD MARKOV DECISION-PROCESSES
    HUBNER, G
    OR SPEKTRUM, 1988, 10 (03) : 161 - 166
  • [7] A Duality Approach for Regret Minimization in Average-Reward Ergodic Markov Decision Processes
    Gong, Hao
    Wang, Mengdi
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 862 - 883
  • [8] A Unified Approach for Semi-Markov Decision Processes with Discounted and Average Reward Criteria
    Li, Yanjie
    Wang, Huijing
    Chen, Haoyao
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1741 - 1744
  • [9] Average Optimality in Nonhomogeneous Infinite Horizon Markov Decision Processes
    Wachs, Allise O.
    Schochetman, Irwin E.
    Smith, Robert L.
    MATHEMATICS OF OPERATIONS RESEARCH, 2011, 36 (01) : 147 - 164
  • [10] MARKOV DECISION-PROCESSES - DISCOUNTED EXPECTED REWARD OR AVERAGE EXPECTED REWARD
    WHITE, DJ
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1993, 172 (02) : 375 - 384