Policy Iteration Q-Learning for Linear Ito Stochastic Systems With Markovian Jumps and Its Application to Power Systems

被引:2
|
作者
Ming, Zhongyang [1 ]
Zhang, Huaguang [1 ]
Wang, Yingchun [1 ]
Dai, Jing [2 ]
机构
[1] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[2] Tsinghua Univ, Energy Internet Innovat Res Inst, Beijing 100085, Peoples R China
关键词
Markovian jump system; neural networks (NNs); Q-learning; stochastic system; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; NONLINEAR-SYSTEMS;
D O I
10.1109/TCYB.2024.3403680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article addresses the solution of continuous-time linear Ito stochastic systems with Markovian jumps using an online policy iteration (PI) approach grounded in Q-learning. Initially, a model-dependent offline algorithm, structured according to traditional optimal control strategies, is designed to solve the algebraic Riccati equation (ARE). Employing Lyapunov theory, we rigorously derive the convergence of the offline PI algorithm and the admissibility of the iterative control law through mathematical analysis. This article represents the first attempt to tackle these technical challenges. Subsequently, to address the limitations inherent in the offline algorithm, we introduce a novel online Q-learning algorithm tailored for Ito stochastic systems with Markovian jumps. The proposed Q-learning algorithm obviates the need for transition probabilities and system matrices. We provide a thorough stability analysis of the closed-loop system. Finally, the effectiveness and applicability of the proposed algorithms are demonstrated through a simulation example, underpinned by the theorems established herein.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] Policy Iteration Q-Learning for Linear It Stochastic Systems With Markovian Jumps and its Application to Power Systems
    Ming, Zhongyang
    Zhang, Huaguang
    Wang, Yingchun
    Dai, Jing
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (12) : 7804 - 7813
  • [2] Base on Q -Learning Pareto Optimality for Linear Ito Stochastic Systems With Markovian Jumps
    Ming, Zhongyang
    Zhang, Huaguang
    Li, Weihua
    Luo, Yanhong
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 965 - 975
  • [3] Based on Q-Learning Optimal Tracking Control Schemes for Linear It(O)over-cap Stochastic Systems With Markovian Jumps
    Li, Mei
    Sun, Jiayue
    Zhang, Huaguang
    Ming, Zhongyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (03) : 1094 - 1098
  • [4] Data-driven policy iteration algorithm for optimal control of continuous-time Ito stochastic systems with Markovian jumps
    Song, Jun
    He, Shuping
    Liu, Fei
    Niu, Yugang
    Ding, Zhengtao
    IET CONTROL THEORY AND APPLICATIONS, 2016, 10 (12): : 1431 - 1439
  • [5] STOCHASTIC CONTROLLABILITY OF LINEAR-SYSTEMS WITH MARKOVIAN JUMPS
    MARITON, M
    AUTOMATICA, 1987, 23 (06) : 783 - 785
  • [6] Q-learning and policy iteration algorithms for stochastic shortest path problems
    Huizhen Yu
    Dimitri P. Bertsekas
    Annals of Operations Research, 2013, 208 : 95 - 132
  • [7] Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
    Lee, Jae Young
    Park, Jin Bae
    Choi, Yoon Ho
    AUTOMATICA, 2012, 48 (11) : 2850 - 2859
  • [8] Q-learning and policy iteration algorithms for stochastic shortest path problems
    Yu, Huizhen
    Bertsekas, Dimitri P.
    ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 95 - 132
  • [9] Online Q-learning for stochastic linear systems with state and control dependent noise
    Zhu, Hongxu
    Wang, Wei
    Wang, Xiaoliang
    Wu, Shufan
    Sun, Ran
    APPLIED SOFT COMPUTING, 2024, 167
  • [10] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15