Optimal Markov Policies for Finite-Horizon Constrained MDPs With Combined Additive and Multiplicative Utilities

被引:0
|
作者
Kumar, Uday M. [1 ]
Kavitha, Veeraruna [2 ]
Bhat, Sanjay P. [1 ]
Hemachandra, Nandyala [2 ]
机构
[1] TCS Res, Hyderabad 500081, India
[2] Indian Inst Technol, Dept Ind Engn & Operat Res, Mumbai 400076, India
来源
关键词
Bilinear program; Markov decision processes; Markov policies; Optimal control; utilities;
D O I
10.1109/LCSYS.2023.3283470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter considers the problem of optimizing a finite-horizon constrained Markov decision process (CMDP) where the objective and constraints are sums of additive and multiplicative utilities. To solve this, we construct another CMDP with only additive utilities whose optimal value over a restricted set of policies is equal to that of the original CMDP. Further, we provide a finite-dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve the proposed BLP.
引用
收藏
页码:2029 / 2034
页数:6
相关论文
共 50 条
  • [1] Optimal policies for a finite-horizon batching inventory model
    Al-Khamis, Talal M.
    Benkherouf, Lakdere
    Omar, Mohamed
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2014, 45 (10) : 2196 - 2202
  • [2] Optimal Policies for a Finite-Horizon Production Inventory Model
    Benkherouf, Lakdere
    Boushehri, Dalal
    ADVANCES IN OPERATIONS RESEARCH, 2012, 2012
  • [3] Decomposition Methods for Solving Finite-Horizon Large MDPs
    el Akraoui, Bouchra
    Daoui, Cherki
    Larach, Abdelhadi
    Rahhali, Khalid
    JOURNAL OF MATHEMATICS, 2022, 2022
  • [4] The nature of optimal policies for deterministic finite-horizon inventory models
    Benkherouf, Lakdere
    Gilding, Brian H.
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE-OPERATIONS & LOGISTICS, 2022, 9 (01) : 39 - 60
  • [5] Finding the K best policies in a finite-horizon Markov decision process
    Nielsen, Lars Relund
    Kristensen, Anders Ringgaard
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 175 (02) : 1164 - 1179
  • [6] Finite-Horizon Optimal Transmission Policies for Energy Harvesting Sensors
    Vaze, Rahul
    Jagannathan, Krishna
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Parametrized actor-critic algorithms for finite-horizon MDPs
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    2007 AMERICAN CONTROL CONFERENCE, VOLS 1-13, 2007, : 2701 - 2706
  • [8] SIFTER: Space-Efficient Value Iteration for Finite-Horizon MDPs
    Skitsas, Konstantinos
    Papageorgiou, Ioannis G.
    Talebi, Mohammad Sadegh
    Kantere, Verena
    Katehakis, Michael N.
    Karras, Panagiotis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (01): : 90 - 98
  • [9] Adaptive dynamic programming for terminally constrained finite-horizon optimal control problems
    Andrews, L.
    Klotz, J. R.
    Kamalapurkar, R.
    Dixon, W. E.
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 5095 - 5100
  • [10] Finite-horizon variance penalised Markov decision processes
    Collins E.J.
    Operations-Research-Spektrum, 1997, 19 (1) : 35 - 39