H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

被引:4
|
作者
Li, Jinna [1 ,2 ]
Xiao, Zhenfei [1 ]
机构
[1] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Liaoning, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
H-infinity control; off-policy Q-learning; game theory; Nash equilibrium; ZERO-SUM GAMES; STATIC OUTPUT-FEEDBACK; DIFFERENTIAL GRAPHICAL GAMES; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; POLE ASSIGNMENT; LINEAR-SYSTEMS; SYNCHRONIZATION; ALGORITHM; DESIGNS;
D O I
10.1109/ACCESS.2020.2970760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel off-policy game Q-learning algorithm to solve control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.
引用
收藏
页码:28831 / 28846
页数:16
相关论文
共 50 条
  • [31] Iterative ADP learning algorithms for discrete-time multi-player games
    Jiang, He
    Zhang, Huaguang
    ARTIFICIAL INTELLIGENCE REVIEW, 2018, 50 (01) : 75 - 91
  • [32] Iterative ADP learning algorithms for discrete-time multi-player games
    He Jiang
    Huaguang Zhang
    Artificial Intelligence Review, 2018, 50 : 75 - 91
  • [33] Efficient off-policy Q-learning for multi-agent systems by solving dual games
    Wang, Yan
    Xue, Huiwen
    Wen, Jiwei
    Liu, Jinfeng
    Luan, Xiaoli
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4193 - 4212
  • [34] Online Adaptive Optimal Control of Discrete-time Linear Systems via Synchronous Q-learning
    Li, Xinxing
    Wang, Xueyuan
    Zha, Wenzhong
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2024 - 2029
  • [35] Neighbor Q-learning based consensus control for discrete-time multi-agent systems
    Zhu, Xiaoxia
    Yuan, Xin
    Dong, Lu
    Wang, Yuanda
    Sun, Changyin
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (03): : 1475 - 1490
  • [36] Reinforcement Q-learning algorithm for H∞ tracking control of discrete-time Markov jump systems
    Shi, Jiahui
    He, Dakuo
    Zhang, Qiang
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025, 56 (03) : 502 - 523
  • [37] Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
    Peng, Yunjian
    Chen, Qian
    Sun, Weijie
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4109 - 4122
  • [38] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
    Skach, Jan
    Kiumarsi, Bahare
    Lewis, Frank L.
    Straka, Ondrej
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
  • [39] Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI
    Kim, J. -H.
    Lewis, F. L.
    AUTOMATICA, 2010, 46 (08) : 1320 - 1326
  • [40] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
    Wang, Chao-Ran
    Wu, Huai-Ning
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407