Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引:3
|
作者
Feinberg, Eugene A. [1 ]
Huang, Jefferson [1 ]
Scherrer, Bruno [2 ,3 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Inria, F-54600 Villers Les Nancy, France
[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
关键词
Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;
D O I
10.1016/j.orl.2014.07.006
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:429 / 431
页数:3
相关论文
共 50 条
  • [32] A Modified Value Iteration Algorithm for Discounted Markov Decision Processes
    Chafik, Sanaa
    Daoui, Cherki
    JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2015, 13 (03) : 47 - 57
  • [33] Policy Iteration for H∞ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming
    Zhu, Yuanheng
    Zhao, Dongbin
    Yang, Xiong
    Zhang, Qichao
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (02) : 500 - 509
  • [34] Policy Iteration for H∞ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming
    Zhu, Yuanheng
    Zhao, Dongbin
    Yang, Xiong
    Zhang, Qichao
    IEEE Transactions on Cybernetics, 2018, 48 (02): : 500 - 509
  • [35] Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems
    Guo, Wentao
    Si, Jennie
    Liu, Feng
    Mei, Shengwei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2794 - 2807
  • [36] The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
    Ye, Yinyu
    MATHEMATICS OF OPERATIONS RESEARCH, 2011, 36 (04) : 593 - 603
  • [37] INVARIANT PROBLEMS IN DISCOUNTED DYNAMIC-PROGRAMMING
    ASSAF, D
    ADVANCES IN APPLIED PROBABILITY, 1978, 10 (02) : 472 - 490
  • [38] Two computationally efficient polynomial-iteration infeasible interior-point algorithms for linear programming
    Yang, Y.
    NUMERICAL ALGORITHMS, 2018, 79 (03) : 957 - 992
  • [39] Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment
    Lin, Ziyu
    Ma, Jun
    Duan, Jingliang
    Li, Shengbo Eben
    Ma, Haitong
    Cheng, Bo
    Lee, Tong Heng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5003 - 5013
  • [40] Two computationally efficient polynomial-iteration infeasible interior-point algorithms for linear programming
    Y. Yang
    Numerical Algorithms, 2018, 79 : 957 - 992