Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics

被引：19

作者：

Xu, Yong ^{[1
]}

Wu, Zheng-Guang ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Heuristic algorithms; Observers; Mathematical models; Approximation algorithms; Multi-agent systems; Symmetric matrices; Regulation; Adaptive observer; approximate dynamic programming (ADP); heterogeneous multiagent systems (HMASs); output tracking; reinforcement learning (RL); COOPERATIVE OUTPUT REGULATION; LINEAR MULTIAGENT SYSTEMS; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; SYNCHRONIZATION; CONSENSUS; OBSERVER; FEEDBACK;

D O I：

10.1109/TNNLS.2022.3172130

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.

引用

页码：3181 / 3190

页数：10

共 50 条

[21] Off-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems
Amirparast, Ali
Kamal Hosseini Sani, S.
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (08) : 5419 - 5437
[22] Optimal Control for Multi-agent Systems Using Off-Policy Reinforcement Learning
Wang, Hao
Chen, Zhiru
Wang, Jun
Lu, Lijun
Li, Mingzhe
2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 135 - 140
[23] Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
Li, Jinna
Yuan, Decheng
Ding, Zhengtao
2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 7 - 12
[24] Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method
Hao, Longyan
Wang, Chaoli
Shi, Yibo
MATHEMATICS, 2024, 12 (10)
[25] Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics
Huang Z.
Tu Y.
Fang H.
Wang H.
Zhang L.
Shi K.
He S.
Journal of the Franklin Institute, 2023, 360 (03) : 2361 - 2378
[26] Optimal Robust Control of Nonlinear Uncertain System via Off-Policy Integral Reinforcement Learning
Wang, Xiaoyang
Ye, Xiufen
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1928 - 1933
[27] Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning
Zhang, Zenglian
Song, Ruizhuo
Cao, Min
NEUROCOMPUTING, 2019, 356 : 162 - 169
[28] Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning
Qin, Jiahu
Li, Man
Shi, Yang
Ma, Qichao
Zheng, Wei Xing
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (01) : 85 - 96
[29] Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning
Liu, Mushuang
Wan, Yan
Lewis, Frank L.
Lopez, Victor G.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5522 - 5533
[30] Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances
Jiang, Xueying
Huang, Min
Shi, Huiyuan
Wang, Xingwei
Zhang, Yanfeng
ISA TRANSACTIONS, 2024, 144 : 228 - 244

← 1 2 3 4 5 →