H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

被引:4
|
作者
Li, Jinna [1 ,2 ]
Xiao, Zhenfei [1 ]
机构
[1] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Liaoning, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
H-infinity control; off-policy Q-learning; game theory; Nash equilibrium; ZERO-SUM GAMES; STATIC OUTPUT-FEEDBACK; DIFFERENTIAL GRAPHICAL GAMES; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; POLE ASSIGNMENT; LINEAR-SYSTEMS; SYNCHRONIZATION; ALGORITHM; DESIGNS;
D O I
10.1109/ACCESS.2020.2970760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel off-policy game Q-learning algorithm to solve control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.
引用
收藏
页码:28831 / 28846
页数:16
相关论文
共 50 条
  • [41] Optimal Control for Interconnected Multi-Area Power Systems With Unknown Dynamics: An Off-Policy Q-Learning Method
    Wang, Jing
    Mi, Xuanrui
    Shen, Hao
    Park, Ju H.
    Shi, Kaibo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (05) : 2849 - 2853
  • [42] Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning
    Wen, Yinlei
    Zhang, Huaguang
    Su, Hanguang
    Ren, He
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2020, 41 (04): : 1233 - 1250
  • [43] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
  • [44] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    WEI QingLai
    LIU DeRong
    ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
  • [45] H∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs
    Yang, Yunjie
    Wan, Yan
    Zhu, Jihong
    Lewis, Frank L.
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 175 - 180
  • [46] Off-Policy L2-Gain Control for Discrete-Time Linear Systems with Dropout
    Huang, Deng
    Zhang, Cong
    Feng, Qian
    INTELLIGENT NETWORKED THINGS, CINT 2024, PT I, 2024, 2138 : 139 - 151
  • [47] Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics
    Huang Z.
    Tu Y.
    Fang H.
    Wang H.
    Zhang L.
    Shi K.
    He S.
    Journal of the Franklin Institute, 2023, 360 (03) : 2361 - 2378
  • [48] Improved Q-Learning Method for Linear Discrete-Time Systems
    Chen, Jian
    Wang, Jinhua
    Huang, Jie
    PROCESSES, 2020, 8 (03)
  • [49] Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming
    Jiang, He
    Zhang, Huaguang
    Xie, Xiangpeng
    Han, Ji
    NEUROCOMPUTING, 2019, 344 : 13 - 19
  • [50] Q-Learning Methods for LQR Control of Completely Unknown Discrete-Time Linear Systems
    Fan, Wenwu
    Xiong, Junlin
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5933 - 5943