Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics

被引:19
|
作者
Xu, Yong [1 ]
Wu, Zheng-Guang [2 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Heuristic algorithms; Observers; Mathematical models; Approximation algorithms; Multi-agent systems; Symmetric matrices; Regulation; Adaptive observer; approximate dynamic programming (ADP); heterogeneous multiagent systems (HMASs); output tracking; reinforcement learning (RL); COOPERATIVE OUTPUT REGULATION; LINEAR MULTIAGENT SYSTEMS; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; SYNCHRONIZATION; CONSENSUS; OBSERVER; FEEDBACK;
D O I
10.1109/TNNLS.2022.3172130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.
引用
收藏
页码:3181 / 3190
页数:10
相关论文
共 50 条
  • [21] Off-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems
    Amirparast, Ali
    Kamal Hosseini Sani, S.
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (08) : 5419 - 5437
  • [22] Optimal Control for Multi-agent Systems Using Off-Policy Reinforcement Learning
    Wang, Hao
    Chen, Zhiru
    Wang, Jun
    Lu, Lijun
    Li, Mingzhe
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 135 - 140
  • [23] Optimal tracking control for discrete-time systems by model-free off-policy Q-learning approach
    Li, Jinna
    Yuan, Decheng
    Ding, Zhengtao
    2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 7 - 12
  • [24] Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method
    Hao, Longyan
    Wang, Chaoli
    Shi, Yibo
    MATHEMATICS, 2024, 12 (10)
  • [25] Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics
    Huang Z.
    Tu Y.
    Fang H.
    Wang H.
    Zhang L.
    Shi K.
    He S.
    Journal of the Franklin Institute, 2023, 360 (03) : 2361 - 2378
  • [26] Optimal Robust Control of Nonlinear Uncertain System via Off-Policy Integral Reinforcement Learning
    Wang, Xiaoyang
    Ye, Xiufen
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1928 - 1933
  • [27] Synchronous optimal control method for nonlinear systems with saturating actuators and unknown dynamics using off-policy integral reinforcement learning
    Zhang, Zenglian
    Song, Ruizhuo
    Cao, Min
    NEUROCOMPUTING, 2019, 356 : 162 - 169
  • [28] Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning
    Qin, Jiahu
    Li, Man
    Shi, Yang
    Ma, Qichao
    Zheng, Wei Xing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (01) : 85 - 96
  • [29] Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning
    Liu, Mushuang
    Wan, Yan
    Lewis, Frank L.
    Lopez, Victor G.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5522 - 5533
  • [30] Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances
    Jiang, Xueying
    Huang, Min
    Shi, Huiyuan
    Wang, Xingwei
    Zhang, Yanfeng
    ISA TRANSACTIONS, 2024, 144 : 228 - 244