Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method

被引:1
|
作者
Wang, Yun [1 ]
Fang, Tian [1 ]
Kong, Qingkai [1 ]
Li, Feng [1 ]
机构
[1] Anhui Univ Technol, Anhui Prov Key Lab Power Elect & Mot Control, Maanshan 243002, Peoples R China
基金
中国国家自然科学基金;
关键词
Markov jump systems; Optimal control; Q-learning method; Game-coupled algebraic Riccati equation; H-INFINITY CONTROL; LINEAR-SYSTEMS; CONTROL DESIGN; NETWORK;
D O I
10.1016/j.amc.2023.128462
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, the zero-sum game problem for linear discrete-time Markov jump systems is solved by two novel model-free reinforcement Q-learning algorithms, on-policy Q-learning and off -policy Q-learning. Firstly, under the framework of the zero-sum game, the game-coupled algebraic Riccati equation is derived. On this basis, subsystem transformation technology is employed to decouple the jumping modes. Then, a model-free on-policy Q-learning algorithm is introduced in the zero-sum game architecture to obtain the optimal control gain by measured system data. However, the probing noise will produce biases in on-policy algorithm. Thus, an off-policy Q -learning algorithm is proposed to eliminate the effect of probing noise. Subsequently, convergence is discussed for the proposed methods. Finally, an inverted pendulum system is employed to verify the validity of the proposed methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] H∞ Control for Discrete-time Linear Systems by Integrating Off-policy Q-learning and Zero-sum Game
    Li, Jinna
    Ding, Zhengtao
    Yang, Chunyu
    Niu, Hong
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2018, : 817 - 822
  • [2] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Ding, Zhengtao
    Jiang, Yi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
  • [3] H∞ Tracking learning control for discrete-time Markov jump systems: A parallel off-policy reinforcement learning
    Zhang, Xuewen
    Xia, Jianwei
    Wang, Jing
    Chen, Xiangyong
    Shen, Hao
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (18): : 14878 - 14890
  • [4] Non-zero-sum games of discrete-time Markov jump systems with unknown dynamics: An off-policy reinforcement learning method
    Zhang, Xuewen
    Shen, Hao
    Li, Feng
    Wang, Jing
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (02) : 949 - 968
  • [5] Nearly Optimal Control for Mixed Zero-Sum Game Based on Off-Policy Integral Reinforcement Learning
    Song, Ruizhuo
    Yang, Gaofu
    Lewis, Frank L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2793 - 2804
  • [6] Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems
    Luo, Biao
    Yang, Yin
    Liu, Derong
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (07) : 3630 - 3640
  • [7] Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
    Li, Jinna
    Yuan, Decheng
    Ding, Zhengtao
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 7 - 12
  • [8] Off-policy inverse Q-learning for discrete-time antagonistic unknown systems
    Lian, Bosen
    Xue, Wenqian
    Xie, Yijing
    Lewis, Frank L.
    Davoudi, Ali
    AUTOMATICA, 2023, 155
  • [9] Data-based discrete-time two-player zero-sum delayed game via policy iteration Q-learning Method
    Jiang, Zongyang
    Zhang, Haiying
    Xiao, Yu
    NEUROCOMPUTING, 2025, 631
  • [10] Stochastic Zero-Sum Differential Games and H∞ Control of Discrete-time Markov Jump Systems
    Zhou Haiying
    Zhu Huainian
    Zhang Chengke
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 151 - 156