Incremental value iteration for time-aggregated Markov-decision processes

被引:22
|
作者
Sun, Tao [1 ]
Zhao, Qianchuan
Luh, Peter B.
机构
[1] Tsing Hua Univ, Ctr Intelligent & Networked Syst CFINS, Dept Automat, Beijing 100084, Peoples R China
[2] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
关键词
fractional cost; Markov-decision processes (MDPs); policy iteration; time aggregation; value iteration;
D O I
10.1109/TAC.2007.908359
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A value iteration algorithm, for time-aggregated Markov-decision processes (MDPs) is developed to solve problems with large state spaces. The algorithm is based on a novel approach which solves a time aggregated MDP by incrementally solving a set of standard MDPs. Therefore, the algorithm converges under the same assumption as standard value iteration. Such assumption is much weaker than that required by the existing time aggregated value iteration algorithm. The algorithms developed in this paper are also applicable to MDPs with fractional costs.
引用
收藏
页码:2177 / 2182
页数:6
相关论文
共 50 条
  • [1] A unified approach to time-aggregated Markov decision processes
    Li, Yanjie
    Wu, Xinyu
    AUTOMATICA, 2016, 67 : 77 - 84
  • [2] Value set iteration for Markov decision processes
    Chang, Hyeong Soo
    AUTOMATICA, 2014, 50 (07) : 1940 - 1943
  • [3] Topological Value Iteration Algorithm for Markov Decision Processes
    Dai, Peng
    Goldsmith, Judy
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
  • [4] New prioritized value iteration for Markov decision processes
    de Guadalupe Garcia-Hernandez, Ma.
    Ruiz-Pinales, Jose
    Onaindia, Eva
    Gabriel Avina-Cervantes, J.
    Ledesma-Orozco, Sergio
    Alvarado-Mendez, Edgar
    Reyes-Ballesteros, Alberto
    ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
  • [5] New prioritized value iteration for Markov decision processes
    Ma. de Guadalupe Garcia-Hernandez
    Jose Ruiz-Pinales
    Eva Onaindia
    J. Gabriel Aviña-Cervantes
    Sergio Ledesma-Orozco
    Edgar Alvarado-Mendez
    Alberto Reyes-Ballesteros
    Artificial Intelligence Review, 2012, 37 : 157 - 167
  • [6] A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
    Zheng, Haifeng
    Wang, Dan
    AIMS MATHEMATICS, 2024, 9 (12): : 33818 - 33842
  • [7] A Modified Value Iteration Algorithm for Discounted Markov Decision Processes
    Chafik, Sanaa
    Daoui, Cherki
    JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2015, 13 (03) : 47 - 57
  • [8] The value iteration method for countable state Markov decision processes
    Aviv, Y
    Federgruen, A
    OPERATIONS RESEARCH LETTERS, 1999, 24 (05) : 223 - 234
  • [9] ON CONVERGENCE OF VALUE ITERATION FOR A CLASS OF TOTAL COST MARKOV DECISION PROCESSES
    Yu, Huizhen
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (04) : 1982 - 2016
  • [10] Advantage Based Value Iteration for Markov Decision Processes with Unknown Rewards
    Alizadeh, Pegah
    Chevaleyre, Yann
    Levy, Francois
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3837 - 3844