H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

被引：4

作者：

Li, Jinna ^{[1
,2
]}

Xiao, Zhenfei ^{[1
]}

机构：

[1] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Liaoning, Peoples R China

[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

H-infinity control; off-policy Q-learning; game theory; Nash equilibrium; ZERO-SUM GAMES; STATIC OUTPUT-FEEDBACK; DIFFERENTIAL GRAPHICAL GAMES; OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; POLE ASSIGNMENT; LINEAR-SYSTEMS; SYNCHRONIZATION; ALGORITHM; DESIGNS;

D O I：

10.1109/ACCESS.2020.2970760

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel off-policy game Q-learning algorithm to solve control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.

引用

页码：28831 / 28846

页数：16

共 50 条

[31] Iterative ADP learning algorithms for discrete-time multi-player games
Jiang, He
Zhang, Huaguang
ARTIFICIAL INTELLIGENCE REVIEW, 2018, 50 (01) : 75 - 91
[32] Iterative ADP learning algorithms for discrete-time multi-player games
He Jiang
Huaguang Zhang
Artificial Intelligence Review, 2018, 50 : 75 - 91
[33] Efficient off-policy Q-learning for multi-agent systems by solving dual games
Wang, Yan
Xue, Huiwen
Wen, Jiwei
Liu, Jinfeng
Luan, Xiaoli
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4193 - 4212
[34] Online Adaptive Optimal Control of Discrete-time Linear Systems via Synchronous Q-learning
Li, Xinxing
Wang, Xueyuan
Zha, Wenzhong
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2024 - 2029
[35] Neighbor Q-learning based consensus control for discrete-time multi-agent systems
Zhu, Xiaoxia
Yuan, Xin
Dong, Lu
Wang, Yuanda
Sun, Changyin
OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (03): : 1475 - 1490
[36] Reinforcement Q-learning algorithm for H∞ tracking control of discrete-time Markov jump systems
Shi, Jiahui
He, Dakuo
Zhang, Qiang
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025, 56 (03) : 502 - 523
[37] Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
Peng, Yunjian
Chen, Qian
Sun, Weijie
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4109 - 4122
[38] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems
Skach, Jan
Kiumarsi, Bahare
Lewis, Frank L.
Straka, Ondrej
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 29 - 40
[39] Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI
Kim, J. -H.
Lewis, F. L.
AUTOMATICA, 2010, 46 (08) : 1320 - 1326
[40] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
Wang, Chao-Ran
Wu, Huai-Ning
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407

← 1 2 3 4 5 →