Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引:3
|
作者
Feinberg, Eugene A. [1 ]
Huang, Jefferson [1 ]
Scherrer, Bruno [2 ,3 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Inria, F-54600 Villers Les Nancy, France
[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
关键词
Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;
D O I
10.1016/j.orl.2014.07.006
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:429 / 431
页数:3
相关论文
共 50 条
  • [1] The value iteration algorithm is not strongly polynomial for discounted dynamic programming
    Feinberg, Eugene A.
    Huang, Jefferson
    OPERATIONS RESEARCH LETTERS, 2014, 42 (02) : 130 - 131
  • [2] MODIFIED POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROBLEMS
    PUTERMAN, ML
    SHIN, MC
    MANAGEMENT SCIENCE, 1978, 24 (11) : 1127 - 1137
  • [3] Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
    Bertsekas, Dimitri P.
    Yu, Huizhen
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 1409 - 1416
  • [4] Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
    Bertsekas, Dimitri P.
    Yu, Huizhen
    MATHEMATICS OF OPERATIONS RESEARCH, 2012, 37 (01) : 66 - 94
  • [5] On policy iteration as a Newton's method and polynomial policy iteration algorithms
    Madani, O
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 273 - 278
  • [6] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION PROCESSES.
    Hartley, R.
    Lavercombe, A.C.
    Thomas, L.C.
    1600, (13):
  • [7] Empirical Policy Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 6573 - 6578
  • [8] COMPUTATIONAL COMPARISON OF POLICY ITERATION ALGORITHMS FOR DISCOUNTED MARKOV DECISION-PROCESSES
    HARTLEY, R
    LAVERCOMBE, AC
    THOMAS, LC
    COMPUTERS & OPERATIONS RESEARCH, 1986, 13 (04) : 411 - 420
  • [9] ACTION ELIMINATION PROCEDURES FOR MODIFIED POLICY ITERATION ALGORITHMS
    PUTERMAN, ML
    SHIN, MC
    OPERATIONS RESEARCH, 1982, 30 (02) : 301 - 317
  • [10] AN EFFICIENT POLICY ITERATION ALGORITHM FOR DYNAMIC PROGRAMMING EQUATIONS
    Alla, Alessandro
    Falcone, Maurizio
    Kalise, Dante
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : A181 - A200