ON FINDING OPTIMAL POLICIES FOR MARKOV DECISION CHAINS - A UNIFYING FRAMEWORK FOR MEAN-VARIANCE-TRADEOFFS

被引：20

作者：

HUANG, Y

KALLENBERG, LCM

机构：

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 1994年 / 19卷 / 02期

关键词：

MARKOV DECISION CHAINS; OPTIMAL POLICIES;

D O I：

10.1287/moor.19.2.434

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C(xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. This program is in turn reduced to maximizing x(B + thetab) over x in the polytope parametrically in theta. Along the way, under the nonnegative-initial-distribution assumption, we generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.

引用

页码：434 / 448

页数：15

共 50 条

[1] Mean Variance Optimality in Markov Decision Chains
Sladky, Karel
Sitar, Milan
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON MATHEMATICAL METHODS IN ECONOMICS 2005, 2005, : 350 - 357
[2] COMPUTATION OF OPTIMAL POLICIES IN DISCOUNTED SEMI-MARKOV DECISION CHAINS
CANTALUPPI, L
OR SPEKTRUM, 1984, 6 (03) : 147 - 160
[3] Optimal Information Collection Policies in a Markov Decision Process Framework
Cipriano, Lauren E.
Goldhaber-Fiebert, Jeremy D.
Liu, Shan
Weber, Thomas A.
MEDICAL DECISION MAKING, 2018, 38 (07) : 797 - 809
[4] The computation of average optimal policies in denumerable state Markov decision chains
Sennott, LI
ADVANCES IN APPLIED PROBABILITY, 1997, 29 (01) : 114 - 137
[5] Finding Provably Optimal Markov Chains
Spel, Jip
Junges, Sebastian
Katoen, Joost-Pieter
TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PT I, TACAS 2021, 2021, 12651 : 173 - 190
[6] Optimal halting policies in Markov population decision chains with constant risk posture
Canbolat, Pelin G.
ANNALS OF OPERATIONS RESEARCH, 2014, 222 (01) : 227 - 237
[7] Optimal halting policies in Markov population decision chains with constant risk posture
Pelin G. Canbolat
Annals of Operations Research, 2014, 222 : 227 - 237
[8] Risk-sensitive and Mean Variance Optimality in Continuous-time Markov Decision Chains
Sladky, Karel
MATHEMATICAL METHODS IN ECONOMICS (MME 2018), 2018, : 497 - 502
[9] A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Ma, Shuai
Ma, Xiaoteng
Xia, Li
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 311 (03) : 1057 - 1067
[10] A note on strong 1-optimal policies in Markov decision chains with unbounded costs
Andrzej S. Nowak
Mathematical Methods of Operations Research, 1999, 49 : 475 - 482

← 1 2 3 4 5 →