Approximate receding horizon approach for Markov decision processes: average reward case

被引：20

作者：

Chang, HS

Marcus, SI ^{[1
]}

机构：

[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

[2] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea

来源：

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS | 2003年 / 286卷 / 02期

关键词：

Markov decision process; receding horizon control; infinite-horizon average reward; policy improvement; rollout; ergodicity;

D O I：

10.1016/S0022-247X(03)00506-7

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call "approximate receding horizon control." We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. (C) 2003 Elsevier Inc. All rights reserved.

引用

页码：636 / 651

页数：16

共 50 条

[21] Average Reward Reinforcement Learning for Semi-Markov Decision Processes
Yang, Jiayuan
Li, Yanjie
Chen, Haoyao
Li, Jiangang
NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777
[22] Detection-averse optimal and receding-horizon control for Markov decision processes
Li, Nan
Kolmanovsky, Ilya
Girard, Anouck
AUTOMATICA, 2020, 122
[23] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ortner, Ronald
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336
[24] AN IMPROVED ALGORITHM FOR SOLVING COMMUNICATING AVERAGE REWARD MARKOV DECISION PROCESSES
Haviv, Moshe
Puterman, Martin L.
ANNALS OF OPERATIONS RESEARCH, 1991, 28 (01) : 229 - 242
[25] Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
Kolarijani, M. A. S.
Max, G. F.
Esfahani, P. Mohajerin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[26] Approximate stochastic annealing for online control of infinite horizon Markov decision processes
Hu, Jiaqiao
Chang, Hyeong Soo
AUTOMATICA, 2012, 48 (09) : 2182 - 2188
[27] APPROXIMATE FIXED POINT ITERATION WITH AN APPLICATION TO INFINITE HORIZON MARKOV DECISION PROCESSES
Almudevar, Anthony
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2008, 47 (05) : 2303 - 2347
[28] On average reward semi-markov decision processes with a general multichain structure
Jianyong, L
Xiaobo, Z
MATHEMATICS OF OPERATIONS RESEARCH, 2004, 29 (02) : 339 - 352
[29] Incremental Improvements of Heuristic Policies for Average-Reward Markov Decision Processes
Reveliotis, S.
Ibrahim, M.
IFAC PAPERSONLINE, 2020, 53 (02): : 1721 - 1728
[30] RVI Reinforcement Learning for Semi-Markov Decision Processes with Average Reward
Li, Yanjie
Cao, Fang
2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 1674 - 1679

← 1 2 3 4 5 →