Optimal Markov Policies for Finite-Horizon Constrained MDPs With Combined Additive and Multiplicative Utilities

被引:0
|
作者
Kumar, Uday M. [1 ]
Kavitha, Veeraruna [2 ]
Bhat, Sanjay P. [1 ]
Hemachandra, Nandyala [2 ]
机构
[1] TCS Res, Hyderabad 500081, India
[2] Indian Inst Technol, Dept Ind Engn & Operat Res, Mumbai 400076, India
来源
关键词
Bilinear program; Markov decision processes; Markov policies; Optimal control; utilities;
D O I
10.1109/LCSYS.2023.3283470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter considers the problem of optimizing a finite-horizon constrained Markov decision process (CMDP) where the objective and constraints are sums of additive and multiplicative utilities. To solve this, we construct another CMDP with only additive utilities whose optimal value over a restricted set of policies is equal to that of the original CMDP. Further, we provide a finite-dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve the proposed BLP.
引用
收藏
页码:2029 / 2034
页数:6
相关论文
共 50 条