Policy Iteration Q-Learning for Linear Ito Stochastic Systems With Markovian Jumps and Its Application to Power Systems

被引:2
|
作者
Ming, Zhongyang [1 ]
Zhang, Huaguang [1 ]
Wang, Yingchun [1 ]
Dai, Jing [2 ]
机构
[1] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110004, Liaoning, Peoples R China
[2] Tsinghua Univ, Energy Internet Innovat Res Inst, Beijing 100085, Peoples R China
关键词
Markovian jump system; neural networks (NNs); Q-learning; stochastic system; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; NONLINEAR-SYSTEMS;
D O I
10.1109/TCYB.2024.3403680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article addresses the solution of continuous-time linear Ito stochastic systems with Markovian jumps using an online policy iteration (PI) approach grounded in Q-learning. Initially, a model-dependent offline algorithm, structured according to traditional optimal control strategies, is designed to solve the algebraic Riccati equation (ARE). Employing Lyapunov theory, we rigorously derive the convergence of the offline PI algorithm and the admissibility of the iterative control law through mathematical analysis. This article represents the first attempt to tackle these technical challenges. Subsequently, to address the limitations inherent in the offline algorithm, we introduce a novel online Q-learning algorithm tailored for Ito stochastic systems with Markovian jumps. The proposed Q-learning algorithm obviates the need for transition probabilities and system matrices. We provide a thorough stability analysis of the closed-loop system. Finally, the effectiveness and applicability of the proposed algorithms are demonstrated through a simulation example, underpinned by the theorems established herein.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [21] Safe Q-learning for continuous-time linear systems
    Bandyopadhyay, Soutrik
    Bhasin, Shubhendu
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 241 - 246
  • [22] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing Li
    Zhihong Peng
    Li Liang
    Wenzhong Zha
    Science China Information Sciences, 2019, 62
  • [23] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Li, Xinxing
    Peng, Zhihong
    Liang, Li
    Zha, Wenzhong
    SCIENCE CHINA-INFORMATION SCIENCES, 2019, 62 (05)
  • [24] Policy iteration based Q-learning for linear nonzero-sum quadratic differential games
    Xinxing LI
    Zhihong PENG
    Li LIANG
    Wenzhong ZHA
    ScienceChina(InformationSciences), 2019, 62 (05) : 195 - 213
  • [25] Application of Q-learning with temperature variation for bidding strategies in market based power systems
    Naghibi-Sistani, MB
    Akbarzadeh-Tootoonchi, MR
    Bayaz, MHJD
    Rajabi-Mashhadi, H
    ENERGY CONVERSION AND MANAGEMENT, 2006, 47 (11-12) : 1529 - 1538
  • [26] Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems
    Luo, Biao
    Yang, Yin
    Liu, Derong
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) : 3630 - 3640
  • [27] Safety-Critical Optimal Control of Discrete-Time Non-Linear Systems via Policy Iteration-Based Q-Learning
    Long, Lijun
    Liu, Xiaomei
    Huang, Xiaomin
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
  • [28] Analysis of linear asynchronous hybrid stochastic systems and its application to multi-agent systems with Markovian switching topologies
    Luo, Shixian
    Deng, Feiqi
    Zhao, Xueyan
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2019, 50 (09) : 1757 - 1770
  • [29] Data-driven tracking control approach for linear systems by on-policy Q-learning approach
    Zhang Yihan
    Mao Zhenfei
    Li Jinna
    16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 1066 - 1070
  • [30] Stochastic Optimal CPS Relaxed Control Methodology for Interconnected Power Systems Using Q-Learning Method
    Yu, Tao
    Zhou, Bin
    Chan, Ka Wing
    Lu, En
    JOURNAL OF ENERGY ENGINEERING, 2011, 137 (03) : 116 - 129