Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引：3

作者：

Feinberg, Eugene A. ^{[1
]}

Huang, Jefferson ^{[1
]}

Scherrer, Bruno ^{[2
,3
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Inria, F-54600 Villers Les Nancy, France

[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France

来源：

OPERATIONS RESEARCH LETTERS | 2014年 / 42卷 / 6-7期

关键词：

Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;

D O I：

10.1016/j.orl.2014.07.006

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：429 / 431

页数：3

共 50 条

[21] DISCOUNTED MARKOV GAMES - GENERALIZED POLICY ITERATION METHOD
VANDERWAL, J
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1978, 25 (01) : 125 - 138
[22] On policy iteration-based discounted optimal control
Dong, Botao
Huang, Longyang
Ma, Xiwen
Zhang, Weidong
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (07) : 4926 - 4942
[23] On discounted dynamic programming with unbounded returns
Matkowski, Janusz
Nowak, Andrzej S.
ECONOMIC THEORY, 2011, 46 (03) : 455 - 474
[24] Modified general policy iteration based adaptive dynamic programming for unknown discrete-time linear systems
Jiang, Huaiyuan
Zhou, Bin
Duan, Guang-Ren
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (12) : 7149 - 7173
[25] 2 ADAPTIVELY STEPPED MONOTONE ALGORITHMS FOR SOLVING DISCOUNTED DYNAMIC-PROGRAMMING EQUATIONS
SUN, M
NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 1993, 14 (1-2) : 167 - 178
[26] ON DISCOUNTED DYNAMIC-PROGRAMMING WITH CONSTRAINTS
TANAKA, K
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1991, 155 (01) : 264 - 277
[27] On discounted dynamic programming with unbounded returns
Janusz Matkowski
Andrzej S. Nowak
Economic Theory, 2011, 46 : 455 - 474
[28] Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
Wei, Qinglai
Zhou, Tianmin
Lu, Jingwei
Liu, Yu
Su, Shuai
Xiao, Jun
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (10): : 6375 - 6387
[29] The complexity of Policy Iteration is exponential for discounted Markov Decision Processes
Hollanders, Romain
Delvenne, Jean-Charles
Jungers, Raphael M.
2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 5997 - 6002
[30] A policy iteration heuristic for constrained discounted controlled Markov Chains
Hyeong Soo Chang
Optimization Letters, 2012, 6 : 1573 - 1577

← 1 2 3 4 5 →