Stability-constrained Markov Decision Processes using MPC

被引:6
|
作者
Zanon, Mario [1 ]
Gros, Sebastien [2 ]
Palladino, Michele [3 ]
机构
[1] IMT Sch Adv Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy
[2] NTNU, Trondheim, Norway
[3] Univ Aquila, Dept Informat Engn Comp Sci & Math DISIM, via Vetoio, I-67100 Laquila, Italy
关键词
Markov Decision Processes; Model Predictive Control; Stability; Safe reinforcement learning; MODEL-PREDICTIVE CONTROL; SYSTEMS;
D O I
10.1016/j.automatica.2022.110399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured approximator in the context of Reinforcement Learning, which makes it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. Because the stability theory for MPC is most mature for the undiscounted MPC case, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the undiscounted MPC-based policy with stability guarantees will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Joint chance-constrained Markov decision processes
    V Varagapriya
    Vikas Vikram Singh
    Abdel Lisser
    Annals of Operations Research, 2023, 322 : 1013 - 1035
  • [32] Constrained discounted semi-Markov decision processes
    Feinberg, EA
    MARKOV PROCESSES AND CONTROLLED MARKOV CHAINS, 2002, : 233 - 244
  • [33] Constrained Risk-Averse Markov Decision Processes
    Ahmadi, Mohamadreza
    Rosolia, Ugo
    Ingham, Michel D.
    Murray, Richard M.
    Ames, Aaron D.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11718 - 11725
  • [34] Semi-Infinitely Constrained Markov Decision Processes
    Zhang, Liangyu
    Peng, Yang
    Yang, Wenhao
    Zhang, Zhihua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Stability-Constrained Power System Scheduling: A Review
    Luo, Jianqiang
    Teng, Fei
    Bu, Siqi
    IEEE ACCESS, 2020, 8 : 219331 - 219343
  • [36] Trading Performance for Stability in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Forejt, Vojtech
    Kucera, Antonin
    2013 28TH ANNUAL IEEE/ACM SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS), 2013, : 331 - 340
  • [37] Trading performance for stability in Markov decision processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Forejt, Vojtech
    Kucera, Antonin
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2017, 84 : 144 - 170
  • [38] Stability Estimation of Transient Markov Decision Processes
    Gordienko, Evgueni
    Martinez, Jaime
    Ruiz de Chavez, Juan
    XI SYMPOSIUM ON PROBABILITY AND STOCHASTIC PROCESSES, 2015, 69 : 157 - 176
  • [39] Transient stability-constrained maximum allowable transfer
    Bettiol, AL
    Wehenkel, L
    Pavella, M
    IEEE TRANSACTIONS ON POWER SYSTEMS, 1999, 14 (02) : 654 - 659
  • [40] Frequency Stability-Constrained Unit Commitment: Tight Approximation Using Bernstein Polynomials
    Zhou, Bo
    Jiang, Ruiwei
    Shen, Siqian
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2024, 39 (04) : 5907 - 5919