Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引:3
|
作者
Feinberg, Eugene A. [1 ]
Huang, Jefferson [1 ]
Scherrer, Bruno [2 ,3 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Inria, F-54600 Villers Les Nancy, France
[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
关键词
Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;
D O I
10.1016/j.orl.2014.07.006
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:429 / 431
页数:3
相关论文
共 50 条
  • [22] On policy iteration-based discounted optimal control
    Dong, Botao
    Huang, Longyang
    Ma, Xiwen
    Zhang, Weidong
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (07) : 4926 - 4942
  • [23] On discounted dynamic programming with unbounded returns
    Matkowski, Janusz
    Nowak, Andrzej S.
    ECONOMIC THEORY, 2011, 46 (03) : 455 - 474
  • [24] Modified general policy iteration based adaptive dynamic programming for unknown discrete-time linear systems
    Jiang, Huaiyuan
    Zhou, Bin
    Duan, Guang-Ren
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (12) : 7149 - 7173
  • [25] 2 ADAPTIVELY STEPPED MONOTONE ALGORITHMS FOR SOLVING DISCOUNTED DYNAMIC-PROGRAMMING EQUATIONS
    SUN, M
    NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 1993, 14 (1-2) : 167 - 178
  • [26] ON DISCOUNTED DYNAMIC-PROGRAMMING WITH CONSTRAINTS
    TANAKA, K
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1991, 155 (01) : 264 - 277
  • [27] On discounted dynamic programming with unbounded returns
    Janusz Matkowski
    Andrzej S. Nowak
    Economic Theory, 2011, 46 : 455 - 474
  • [28] Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
    Wei, Qinglai
    Zhou, Tianmin
    Lu, Jingwei
    Liu, Yu
    Su, Shuai
    Xiao, Jun
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (10): : 6375 - 6387
  • [29] The complexity of Policy Iteration is exponential for discounted Markov Decision Processes
    Hollanders, Romain
    Delvenne, Jean-Charles
    Jungers, Raphael M.
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 5997 - 6002
  • [30] A policy iteration heuristic for constrained discounted controlled Markov Chains
    Hyeong Soo Chang
    Optimization Letters, 2012, 6 : 1573 - 1577