ON FINDING OPTIMAL POLICIES FOR MARKOV DECISION CHAINS - A UNIFYING FRAMEWORK FOR MEAN-VARIANCE-TRADEOFFS

被引：20

作者：

HUANG, Y

KALLENBERG, LCM

机构：

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 1994年 / 19卷 / 02期

关键词：

MARKOV DECISION CHAINS; OPTIMAL POLICIES;

D O I：

10.1287/moor.19.2.434

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C(xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. This program is in turn reduced to maximizing x(B + thetab) over x in the polytope parametrically in theta. Along the way, under the nonnegative-initial-distribution assumption, we generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.

引用

页码：434 / 448

页数：15

共 50 条

[21] Optimal Policies for Quantum Markov Decision Processes
Ming-Sheng Ying
Yuan Feng
Sheng-Gang Ying
International Journal of Automation and Computing, 2021, 18 (03) : 410 - 421
[22] Optimal adaptive policies for Markov decision processes
Burnetas, AN
Katehakis, MN
MATHEMATICS OF OPERATIONS RESEARCH, 1997, 22 (01) : 222 - 255
[23] Optimal Policies for Quantum Markov Decision Processes
Ming-Sheng Ying
Yuan Feng
Sheng-Gang Ying
International Journal of Automation and Computing, 2021, 18 : 410 - 421
[24] OPTIMAL POLICIES FOR CONTROLLED MARKOV-CHAINS WITH A CONSTRAINT
BEUTLER, FJ
ROSS, KW
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1985, 112 (01) : 236 - 252
[25] WEAK CONDITIONS FOR THE EXISTENCE OF OPTIMAL STATIONARY POLICIES IN AVERAGE MARKOV DECISION CHAINS WITH UNBOUNDED COSTS
CAVAZOSCADENA, R
KYBERNETIKA, 1989, 25 (03) : 145 - 156
[26] OPTIMAL STATIONARY POLICIES IN GENERAL STATE-SPACE MARKOV DECISION CHAINS WITH FINITE ACTION SETS
RITT, RK
SENNOTT, LI
MATHEMATICS OF OPERATIONS RESEARCH, 1992, 17 (04) : 901 - 909
[27] Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains
Cavazos-Cadena, R
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2002, 56 (02) : 181 - 196
[28] SAMPLE-PATH OPTIMAL STATIONARY POLICIES IN STABLE MARKOV DECISION CHAINS WITH THE AVERAGE REWARD CRITERION
Cavazos-Cadena, Rolando
Montes-De-Oca, Raul
Sladky, Karel
JOURNAL OF APPLIED PROBABILITY, 2015, 52 (02) : 419 - 440
[29] Value iteration and approximately optimal stationary policies in finite-state average Markov decision chains
Rolando Cavazos-Cadena
Rolando Cavazos-Cadena
Mathematical Methods of Operations Research, 2002, 56 : 181 - 196
[30] Simple procedures for finding mean first passage times in Markov chains
Hunter, Jeffrey J.
ASIA-PACIFIC JOURNAL OF OPERATIONAL RESEARCH, 2007, 24 (06) : 813 - 829

← 1 2 3 4 5 →