Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

被引:3
|
作者
Feinberg, Eugene A. [1 ]
Huang, Jefferson [1 ]
Scherrer, Bruno [2 ,3 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Inria, F-54600 Villers Les Nancy, France
[3] Univ Lorraine, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
关键词
Markov decision process; Modified policy iteration; Strongly polynomial; Policy; Algorithm; MARKOV DECISION-PROBLEMS; SIMPLEX; MDPS;
D O I
10.1016/j.orl.2014.07.006
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and lambda-policy iteration algorithms are not strongly polynomial. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:429 / 431
页数:3
相关论文
共 50 条
  • [41] Policy Iteration Approximate Dynamic Programming Using Volterra Series Based Actor
    Guo, Wentao
    Si, Jennie
    Liu, Feng
    Mei, Shengwei
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 249 - 255
  • [42] Robust Modified Policy Iteration
    Kaufman, David L.
    Schaefer, Andrew J.
    INFORMS JOURNAL ON COMPUTING, 2013, 25 (03) : 396 - 410
  • [43] Projections for Approximate Policy Iteration Algorithms
    Akrour, Riad
    Pajarinen, Joni
    Peters, Jan
    Neumann, Gerhard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [44] Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths
    Karczmarz, Adam
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 154 - 172
  • [45] Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms
    Zhang, Huaguang
    Jiang, He
    Luo, Chaomin
    Xiao, Geyang
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (10) : 3331 - 3340
  • [46] A PERTURBATION APPROACH TO A CLASS OF DISCOUNTED APPROXIMATE VALUE ITERATION ALGORITHMS WITH BOREL SPACES
    Vega-Amaya, Oscar
    Lopez-Borbon, Joaqun
    JOURNAL OF DYNAMICS AND GAMES, 2016, 3 (03): : 261 - 278
  • [47] A note on policy algorithms for discounted Markov decision problems
    Ng, MK
    OPERATIONS RESEARCH LETTERS, 1999, 25 (04) : 195 - 197
  • [48] Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems
    Liu, Derong
    Wei, Qinglai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (03) : 621 - 634
  • [49] Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems
    Liu, Derong
    Wei, Qinglai
    Yan, Pengfei
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2015, 45 (12): : 1577 - 1591
  • [50] Policy iteration-approximate dynamic programming for large scale unit commitment problems
    Wei, Hua
    Long, Danli
    Li, Jinghua
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2014, 34 (25): : 4420 - 4429