Optimal Markov Policies for Finite-Horizon Constrained MDPs With Combined Additive and Multiplicative Utilities

被引：0

作者：

Kumar, Uday M. ^{[1
]}

Kavitha, Veeraruna ^{[2
]}

Bhat, Sanjay P. ^{[1
]}

Hemachandra, Nandyala ^{[2
]}

机构：

[1] TCS Res, Hyderabad 500081, India

[2] Indian Inst Technol, Dept Ind Engn & Operat Res, Mumbai 400076, India

来源：

IEEE CONTROL SYSTEMS LETTERS | 2023年 / 7卷

关键词：

Bilinear program; Markov decision processes; Markov policies; Optimal control; utilities;

D O I：

10.1109/LCSYS.2023.3283470

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter considers the problem of optimizing a finite-horizon constrained Markov decision process (CMDP) where the objective and constraints are sums of additive and multiplicative utilities. To solve this, we construct another CMDP with only additive utilities whose optimal value over a restricted set of policies is equal to that of the original CMDP. Further, we provide a finite-dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve the proposed BLP.

引用

页码：2029 / 2034

页数：6

共 50 条

[1] Optimal policies for a finite-horizon batching inventory model
Al-Khamis, Talal M.
Benkherouf, Lakdere
Omar, Mohamed
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2014, 45 (10) : 2196 - 2202
[2] Optimal Policies for a Finite-Horizon Production Inventory Model
Benkherouf, Lakdere
Boushehri, Dalal
ADVANCES IN OPERATIONS RESEARCH, 2012, 2012
[3] Decomposition Methods for Solving Finite-Horizon Large MDPs
el Akraoui, Bouchra
Daoui, Cherki
Larach, Abdelhadi
Rahhali, Khalid
JOURNAL OF MATHEMATICS, 2022, 2022
[4] The nature of optimal policies for deterministic finite-horizon inventory models
Benkherouf, Lakdere
Gilding, Brian H.
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE-OPERATIONS & LOGISTICS, 2022, 9 (01) : 39 - 60
[5] Finding the K best policies in a finite-horizon Markov decision process
Nielsen, Lars Relund
Kristensen, Anders Ringgaard
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 175 (02) : 1164 - 1179
[6] Finite-Horizon Optimal Transmission Policies for Energy Harvesting Sensors
Vaze, Rahul
Jagannathan, Krishna
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Parametrized actor-critic algorithms for finite-horizon MDPs
Abdulla, Mohammed Shahid
Bhatnagar, Shalabh
2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 2701 - 2706
[8] SIFTER: Space-Efficient Value Iteration for Finite-Horizon MDPs
Skitsas, Konstantinos
Papageorgiou, Ioannis G.
Talebi, Mohammad Sadegh
Kantere, Verena
Katehakis, Michael N.
Karras, Panagiotis
PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (01): : 90 - 98
[9] Adaptive dynamic programming for terminally constrained finite-horizon optimal control problems
Andrews, L.
Klotz, J. R.
Kamalapurkar, R.
Dixon, W. E.
2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 5095 - 5100
[10] Finite-horizon variance penalised Markov decision processes
Collins E.J.
Operations-Research-Spektrum, 1997, 19 (1) : 35 - 39

← 1 2 3 4 5 →