Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引：3

作者：

Feinberg, Eugene A. ^{[1
]}

Huang, Jefferson ^{[1
]}

Scherrer, Bruno ^{[2
,3
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Inria, F-54600 Villers Les Nancy, France

[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France

来源：

OPERATIONS RESEARCH LETTERS | 2014年 / 42卷 / 6-7期

关键词：

Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;

D O I：

10.1016/j.orl.2014.07.006

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：429 / 431

页数：3

共 50 条

[41] Policy Iteration Approximate Dynamic Programming Using Volterra Series Based Actor
Guo, Wentao
Si, Jennie
Liu, Feng
Mei, Shengwei
PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 249 - 255
[42] Robust Modified Policy Iteration
Kaufman, David L.
Schaefer, Andrew J.
INFORMS JOURNAL ON COMPUTING, 2013, 25 (03) : 396 - 410
[43] Projections for Approximate Policy Iteration Algorithms
Akrour, Riad
Pajarinen, Joni
Peters, Jan
Neumann, Gerhard
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[44] Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths
Karczmarz, Adam
PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 154 - 172
[45] Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms
Zhang, Huaguang
Jiang, He
Luo, Chaomin
Xiao, Geyang
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (10) : 3331 - 3340
[46] A PERTURBATION APPROACH TO A CLASS OF DISCOUNTED APPROXIMATE VALUE ITERATION ALGORITHMS WITH BOREL SPACES
Vega-Amaya, Oscar
Lopez-Borbon, Joaqun
JOURNAL OF DYNAMICS AND GAMES, 2016, 3 (03): : 261 - 278
[47] A note on policy algorithms for discounted Markov decision problems
Ng, MK
OPERATIONS RESEARCH LETTERS, 1999, 25 (04) : 195 - 197
[48] Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems
Liu, Derong
Wei, Qinglai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (03) : 621 - 634
[49] Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems
Liu, Derong
Wei, Qinglai
Yan, Pengfei
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2015, 45 (12): : 1577 - 1591
[50] Policy iteration-approximate dynamic programming for large scale unit commitment problems
Wei, Hua
Long, Danli
Li, Jinghua
Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2014, 34 (25): : 4420 - 4429

← 1 2 3 4 5 →