A note on policy algorithms for discounted Markov decision problems

被引:2
|
作者
Ng, MK [1 ]
机构
[1] Univ Hong Kong, Dept Math, Pokfulam Rd, Hong Kong, Peoples R China
关键词
discounted Markov decision process; policy algorithm; matrices;
D O I
10.1016/S0167-6377(99)00051-6
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
In this note, we show that the evaluation phase in the policy iteration algorithm for the infinite horizon discounted Markov decision problem can be done in O(mN(2)) operations, where N is the number of states of the Markov decision process and m is the number of states in which the decision changes during the policy improvement phase. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:195 / 197
页数:3
相关论文
共 50 条