Stability-constrained Markov Decision Processes using MPC

被引:6
|
作者
Zanon, Mario [1 ]
Gros, Sebastien [2 ]
Palladino, Michele [3 ]
机构
[1] IMT Sch Adv Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy
[2] NTNU, Trondheim, Norway
[3] Univ Aquila, Dept Informat Engn Comp Sci & Math DISIM, via Vetoio, I-67100 Laquila, Italy
关键词
Markov Decision Processes; Model Predictive Control; Stability; Safe reinforcement learning; MODEL-PREDICTIVE CONTROL; SYSTEMS;
D O I
10.1016/j.automatica.2022.110399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms
    de Nijs, Frits
    Walraven, Erwin
    de Weerdt, Mathijs M.
    Spaan, Matthijs T. J.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 955 - 1001
  • [42] Constrained Markov Decision Processes with Total Expected Cost Criteria
    Altman, Eitan
    Boularouk, Said
    Josselin, Didier
    PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 191 - 192
  • [43] Learning algorithms for finite horizon constrained markov decision processes
    Mittal, A.
    Hemachandra, N.
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2007, 3 (03) : 429 - 444
  • [44] Recursively-Constrained Partially Observable Markov Decision Processes
    Ho, Qi Heng
    Becker, Tyler
    Kraske, Benjamin
    Laouar, Zakariya
    Feather, Martin S.
    Rossi, Federico
    Lahijanian, Morteza
    Sunberg, Zachary
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1658 - 1680
  • [45] An actor-critic algorithm for constrained Markov decision processes
    Borkar, VS
    SYSTEMS & CONTROL LETTERS, 2005, 54 (03) : 207 - 213
  • [46] Non-randomized policies for constrained Markov decision processes
    Chen, Richard C.
    Feinberg, Eugene A.
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2007, 66 (01) : 165 - 179
  • [48] CONSTRAINED MARKOV DECISION PROCESSES WITH EXPECTED TOTAL REWARD CRITERIA
    Jaskiewicz, Anna
    Nowak, Andrzej S.
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (05) : 3118 - 3136
  • [49] Potential based optimization algorithm of constrained Markov decision processes
    Li Yanjie
    Yin Baoqun
    Xi Hongsheng
    Proceedings of the 24th Chinese Control Conference, Vols 1 and 2, 2005, : 433 - 436
  • [50] Constrained semi-markov decision processes with average rewards
    Feinberg, E.A.
    ZOR. Zeitschrift Fuer Operations Research, 1994, 40 (03):