Potential-based online policy iteration algorithms for Markov decision processes

被引：26

作者：

Fang, HT ^{[1
]}

Cao, XR

机构：

[1] Chinese Acad Sci, Acad Math & Syst Sci, Lab Syst & Control, Beijing 100080, Peoples R China

[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2004年 / 49卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Markov decision process; potential; recursive optimization;

D O I：

10.1109/TAC.2004.825647

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal. policy can be attained after it finite number of iterations. A simulation example,is given to illustrate the main ideas and the convergence rates of the algorithms.

引用

页码：493 / 505

页数：13

共 50 条

[31] SERIAL AND PARALLEL VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
ARCHIBALD, TW
MCKINNON, KIM
THOMAS, LC
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1993, 67 (02) : 188 - 203
[32] Online Markov Decision Processes
Even-Dar, Eyal
Kakade, Sham M.
Mansour, Yishay
MATHEMATICS OF OPERATIONS RESEARCH, 2009, 34 (03) : 726 - 736
[33] Value set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[34] COMPUTATIONAL COMPARISON OF VALUE-ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
THOMAS, LC
HARLEY, R
LAVERCOMBE, AC
OPERATIONS RESEARCH LETTERS, 1983, 2 (02) : 72 - 76
[35] COMPUTATIONALLY EFFICIENT ALGORITHMS FOR ONLINE OPTIMIZATION OF MARKOV DECISION-PROCESSES
JALALI, A
FERGUSON, MJ
AUTOMATICA, 1992, 28 (01) : 107 - 118
[36] Advantage Based Value Iteration for Markov Decision Processes with Unknown Rewards
Alizadeh, Pegah
Chevaleyre, Yann
Levy, Francois
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3837 - 3844
[37] Deterministic policy gradient algorithms for semi-Markov decision processes
Hosseinloo, Ashkan Haji
Dahleh, Munther A.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (07) : 4008 - 4019
[38] Efficient Off-Policy Algorithms for Structured Markov Decision Processes
Ganguly, Sourav
Diddigi, Raghuram Bharadwaj
Prabuchandran, K. J.
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 8312 - 8319
[39] A NEW POLICY ITERATION SCHEME FOR MARKOV DECISION-PROCESSES USING SCHWEITZER FORMULA
LASSERRE, JB
JOURNAL OF APPLIED PROBABILITY, 1994, 31 (01) : 268 - 273
[40] The policy iteration algorithm for average reward Markov decision processes with general state space
Meyn, SP
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) : 1663 - 1680

← 1 2 3 4 5 →