Policy Iteration Q-Learning for Linear Ito Stochastic Systems With Markovian Jumps and Its Application to Power Systems

被引：2

作者：

Ming, Zhongyang ^{[1
]}

Zhang, Huaguang ^{[1
]}

Wang, Yingchun ^{[1
]}

Dai, Jing ^{[2
]}

机构：

[1] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China

[2] Tsinghua Univ, Energy Internet Innovat Res Inst, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2024年

关键词：

Markovian jump system; neural networks (NNs); Q-learning; stochastic system; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; NONLINEAR-SYSTEMS;

D O I：

10.1109/TCYB.2024.3403680

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article addresses the solution of continuous-time linear Ito stochastic systems with Markovian jumps using an online policy iteration (PI) approach grounded in Q-learning. Initially, a model-dependent offline algorithm, structured according to traditional optimal control strategies, is designed to solve the algebraic Riccati equation (ARE). Employing Lyapunov theory, we rigorously derive the convergence of the offline PI algorithm and the admissibility of the iterative control law through mathematical analysis. This article represents the first attempt to tackle these technical challenges. Subsequently, to address the limitations inherent in the offline algorithm, we introduce a novel online Q-learning algorithm tailored for Ito stochastic systems with Markovian jumps. The proposed Q-learning algorithm obviates the need for transition probabilities and system matrices. We provide a thorough stability analysis of the closed-loop system. Finally, the effectiveness and applicability of the proposed algorithms are demonstrated through a simulation example, underpinned by the theorems established herein.

引用

页码：1 / 10

页数：10

共 50 条

[1] Policy Iteration Q-Learning for Linear It Stochastic Systems With Markovian Jumps and its Application to Power Systems
Ming, Zhongyang
Zhang, Huaguang
Wang, Yingchun
Dai, Jing
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (12) : 7804 - 7813
[2] Base on Q -Learning Pareto Optimality for Linear Ito Stochastic Systems With Markovian Jumps
Ming, Zhongyang
Zhang, Huaguang
Li, Weihua
Luo, Yanhong
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 965 - 975
[3] Based on Q-Learning Optimal Tracking Control Schemes for Linear It(O)over-cap Stochastic Systems With Markovian Jumps
Li, Mei
Sun, Jiayue
Zhang, Huaguang
Ming, Zhongyang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (03) : 1094 - 1098
[4] Data-driven policy iteration algorithm for optimal control of continuous-time Ito stochastic systems with Markovian jumps
Song, Jun
He, Shuping
Liu, Fei
Niu, Yugang
Ding, Zhengtao
IET CONTROL THEORY AND APPLICATIONS, 2016, 10 (12): : 1431 - 1439
[5] STOCHASTIC CONTROLLABILITY OF LINEAR-SYSTEMS WITH MARKOVIAN JUMPS
MARITON, M
AUTOMATICA, 1987, 23 (06) : 783 - 785
[6] Q-learning and policy iteration algorithms for stochastic shortest path problems
Huizhen Yu
Dimitri P. Bertsekas
Annals of Operations Research, 2013, 208 : 95 - 132
[7] Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
Lee, Jae Young
Park, Jin Bae
Choi, Yoon Ho
AUTOMATICA, 2012, 48 (11) : 2850 - 2859
[8] Q-learning and policy iteration algorithms for stochastic shortest path problems
Yu, Huizhen
Bertsekas, Dimitri P.
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 95 - 132
[9] Online Q-learning for stochastic linear systems with state and control dependent noise
Zhu, Hongxu
Wang, Wei
Wang, Xiaoliang
Wu, Shufan
Sun, Ran
APPLIED SOFT COMPUTING, 2024, 167
[10] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
Wei QingLai
Liu DeRong
SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15

← 1 2 3 4 5 →