Incremental value iteration for time-aggregated Markov-decision processes

被引:22
|
作者
Sun, Tao [1 ]
Zhao, Qianchuan
Luh, Peter B.
机构
[1] Tsing Hua Univ, Ctr Intelligent & Networked Syst CFINS, Dept Automat, Beijing 100084, Peoples R China
[2] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
关键词
fractional cost; Markov-decision processes (MDPs); policy iteration; time aggregation; value iteration;
D O I
10.1109/TAC.2007.908359
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A value iteration algorithm, for time-aggregated Markov-decision processes (MDPs) is developed to solve problems with large state spaces. The algorithm is based on a novel approach which solves a time aggregated MDP by incrementally solving a set of standard MDPs. Therefore, the algorithm converges under the same assumption as standard value iteration. Such assumption is much weaker than that required by the existing time aggregated value iteration algorithm. The algorithms developed in this paper are also applicable to MDPs with fractional costs.
引用
收藏
页码:2177 / 2182
页数:6
相关论文
共 50 条
  • [21] IntervalMDP. jl: Accelerated Value Iteration for Interval Markov Decision Processes
    Mathiesen, Frederik Baymler
    Lahijanian, Morteza
    Laurenti, Luca
    IFAC PAPERSONLINE, 2024, 58 (11): : 1 - 6
  • [22] Variance reduced value iteration and faster algorithms for solving Markov decision processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    NAVAL RESEARCH LOGISTICS, 2023, 70 (05) : 423 - 442
  • [23] A pause control approach to the value iteration scheme in average Markov decision processes
    Cavazos-Cadena, R
    SYSTEMS & CONTROL LETTERS, 1998, 33 (04) : 209 - 219
  • [24] A method for speeding up value iteration in partially observable Markov decision processes
    Zhang, NL
    Lee, SS
    Zhang, WH
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1999, : 696 - 703
  • [25] ISOTONE POLICIES FOR THE VALUE-ITERATION METHOD FOR MARKOV DECISION-PROCESSES
    WHITE, DJ
    OR SPEKTRUM, 1984, 6 (04) : 223 - 227
  • [26] Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
    Sidford, Aaron
    Wang, Mengdi
    Wu, Xian
    Ye, Yinyu
    SODA'18: PROCEEDINGS OF THE TWENTY-NINTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2018, : 770 - 787
  • [27] Value Iteration and Action ε-Approximation of Optimal Policies in Discounted Markov Decision Processes
    Montes-De-Oca, Raul
    Lemus-Rodriguez, Enrique
    RECENT ADVANCES IN APPLIED MATHEMATICS, 2009, : 213 - +
  • [28] Value Iteration for Long-Run Average Reward in Markov Decision Processes
    Ashok, Pranav
    Chatterjee, Krishnendu
    Daca, Przemyslaw
    Kretinsky, Jan
    Meggendorfer, Tobias
    COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 : 201 - 221
  • [29] Speeding up the convergence of value iteration in partially observable Markov decision processes
    Zhang, NL
    Zhang, WH
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 14 : 29 - 51
  • [30] A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
    Vijesh, Villavarayan Antony
    Rudresha, Shreyas Sumithra
    Abdulla, Mohammed Shahid
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2023, 199 (03) : 1022 - 1049