Incremental value iteration for time-aggregated Markov-decision processes

被引：22

作者：

Sun, Tao ^{[1
]}

Zhao, Qianchuan

Luh, Peter B.

机构：

[1] Tsing Hua Univ, Ctr Intelligent & Networked Syst CFINS, Dept Automat, Beijing 100084, Peoples R China

[2] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2007年 / 52卷 / 11期

关键词：

fractional cost; Markov-decision processes (MDPs); policy iteration; time aggregation; value iteration;

D O I：

10.1109/TAC.2007.908359

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A value iteration algorithm, for time-aggregated Markov-decision processes (MDPs) is developed to solve problems with large state spaces. The algorithm is based on a novel approach which solves a time aggregated MDP by incrementally solving a set of standard MDPs. Therefore, the algorithm converges under the same assumption as standard value iteration. Such assumption is much weaker than that required by the existing time aggregated value iteration algorithm. The algorithms developed in this paper are also applicable to MDPs with fractional costs.

引用

页码：2177 / 2182

页数：6

共 50 条

[1] A unified approach to time-aggregated Markov decision processes
Li, Yanjie
Wu, Xinyu
AUTOMATICA, 2016, 67 : 77 - 84
[2] Value set iteration for Markov decision processes
Chang, Hyeong Soo
AUTOMATICA, 2014, 50 (07) : 1940 - 1943
[3] Topological Value Iteration Algorithm for Markov Decision Processes
Dai, Peng
Goldsmith, Judy
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1860 - 1865
[4] New prioritized value iteration for Markov decision processes
de Guadalupe Garcia-Hernandez, Ma.
Ruiz-Pinales, Jose
Onaindia, Eva
Gabriel Avina-Cervantes, J.
Ledesma-Orozco, Sergio
Alvarado-Mendez, Edgar
Reyes-Ballesteros, Alberto
ARTIFICIAL INTELLIGENCE REVIEW, 2012, 37 (02) : 157 - 167
[5] New prioritized value iteration for Markov decision processes
Ma. de Guadalupe Garcia-Hernandez
Jose Ruiz-Pinales
Eva Onaindia
J. Gabriel Aviña-Cervantes
Sergio Ledesma-Orozco
Edgar Alvarado-Mendez
Alberto Reyes-Ballesteros
Artificial Intelligence Review, 2012, 37 : 157 - 167
[6] A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
Zheng, Haifeng
Wang, Dan
AIMS MATHEMATICS, 2024, 9 (12): : 33818 - 33842
[7] A Modified Value Iteration Algorithm for Discounted Markov Decision Processes
Chafik, Sanaa
Daoui, Cherki
JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2015, 13 (03) : 47 - 57
[8] The value iteration method for countable state Markov decision processes
Aviv, Y
Federgruen, A
OPERATIONS RESEARCH LETTERS, 1999, 24 (05) : 223 - 234
[9] ON CONVERGENCE OF VALUE ITERATION FOR A CLASS OF TOTAL COST MARKOV DECISION PROCESSES
Yu, Huizhen
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2015, 53 (04) : 1982 - 2016
[10] Advantage Based Value Iteration for Markov Decision Processes with Unknown Rewards
Alizadeh, Pegah
Chevaleyre, Yann
Levy, Francois
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3837 - 3844

← 1 2 3 4 5 →