MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP

被引:49
|
作者
SOBEL, MJ
机构
[1] SUNY STONY BROOK,INST DECIS SCI,STONY BROOK,NY 11794
[2] SUNY STONY BROOK,DEPT APPL MATH & STAT,STONY BROOK,NY 11794
关键词
DYNAMIC PROGRAMMING; MARKOV; MEAN-VARIANCE TRADEOFF; PROGRAMMING; MULTIPLE CRITERIA;
D O I
10.1287/opre.42.1.175
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
A stationary policy and an initial state in an MDP (Markov decision process) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the unichain case, Pareto optima can be computed either with policy improvement or with a linear program having the same number of variables and one more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic class is an element of choice.
引用
收藏
页码:175 / 183
页数:9
相关论文
共 50 条