Sample-path optimality and variance-maximization for Markov decision processes

被引：3

作者：

Zhu, Q. X. ^{[1
]}

机构：

[1] S China Normal Univ, Dept Math, Guangzhou 510631, Peoples R China

来源：

MATHEMATICAL METHODS OF OPERATIONS RESEARCH | 2007年 / 65卷 / 03期

关键词：

discrete-time Markov decision process; unbounded reward; sample-path reward criterion; variance-maximization; optimal stationary policy;

D O I：

10.1007/s00186-006-0126-9

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system's primitive data and under which we prove the existence of ASPR-optimal stationary policies and variance optimal policies. Our conditions are weaker than those in the previous literature. Moreover, our results are illustrated by a controlled queueing system.

引用

页码：519 / 538

页数：20

共 50 条

[1] Sample-path optimality and variance-maximization for Markov decision processes
Q. X. Zhu
Mathematical Methods of Operations Research, 2007, 65 : 519 - 538
[2] Sample-path average optimality for Markov control processes
Lasserre, JB
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1999, 44 (10) : 1966 - 1971
[3] Sample-path optimality and variance-minimization of average cost Markov control processes
Hernández-Lerma, O
Vega-Amaya, O
Carrasco, G
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1999, 38 (01) : 79 - 93
[4] Sample-path optimality and variance-minimization of average cost Markov control processes
Hernández-Lerma, Onésimo
Vega-Amaya, Oscar
Carrasco, Guadalupe
SIAM Journal on Control and Optimization, 38 (01): : 79 - 93
[5] Average sample-path optimality for continuous-time Markov decision processes in Polish spaces
Zhu, Quan-xin
ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2011, 27 (04): : 613 - 624
[6] Average sample-path optimality for continuous-time Markov decision processes in Polish spaces
Quan-xin Zhu
Acta Mathematicae Applicatae Sinica, English Series, 2011, 27 : 613 - 624
[7] A Sensitivity-Based Construction Approach to Sample-Path Variance Minimization of Markov Decision Processes
Huang, Yonghao
Chen, Xi
2012 2ND AUSTRALIAN CONTROL CONFERENCE (AUCC), 2012, : 215 - 220
[8] A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion
Rolando Cavazos-Cadena
Raúl Montes-de-Oca
Karel Sladký
Journal of Optimization Theory and Applications, 2014, 163 : 674 - 684
[9] A Counterexample on Sample-Path Optimality in Stable Markov Decision Chains with the Average Reward Criterion
Cavazos-Cadena, Rolando
Montes-de-Oca, Raul
Sladky, Karel
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2014, 163 (02) : 674 - 684
[10] Sample-path and variance minimization of Markov control processes with average cost criteria
Hernández-Lerma, O
Vega-Amaya, O
Carrasco, G
PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 1172 - 1176

← 1 2 3 4 5 →