Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics

被引:19
|
作者
Xu, Yong [1 ]
Wu, Zheng-Guang [2 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Heuristic algorithms; Observers; Mathematical models; Approximation algorithms; Multi-agent systems; Symmetric matrices; Regulation; Adaptive observer; approximate dynamic programming (ADP); heterogeneous multiagent systems (HMASs); output tracking; reinforcement learning (RL); COOPERATIVE OUTPUT REGULATION; LINEAR MULTIAGENT SYSTEMS; ADAPTIVE OPTIMAL-CONTROL; CONTINUOUS-TIME SYSTEMS; SYNCHRONIZATION; CONSENSUS; OBSERVER; FEEDBACK;
D O I
10.1109/TNNLS.2022.3172130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.
引用
收藏
页码:3181 / 3190
页数:10
相关论文
共 50 条
  • [1] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Data-efficient Hindsight Off-policy Option Learning
    Wulfmeier, Markus
    Rao, Dushyant
    Hafner, Roland
    Lampe, Thomas
    Abdolmaleki, Abbas
    Hertweck, Tim
    Neunert, Michael
    Tirumala, Dhruva
    Siegel, Noah
    Heess, Nicolas
    Riedmiller, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Off-policy Q-learning: Optimal tracking control for networked control systems
    Li J.-N.
    Yin Z.-X.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (11): : 2343 - 2349
  • [4] H∞ Optimal Distributed Tracking Control of Network Distributed Systems over Directed Networks via Off-Policy Reinforcement Learning
    Kucuksayacigil, Gulnihal
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [5] Data-Efficient Control Policy Search using Residual Dynamics Learning
    Saveriano, Matteo
    Yin, Yuchao
    Falco, Pietro
    Lee, Dongheui
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4709 - 4715
  • [6] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
    Wang, Chao-Ran
    Wu, Huai-Ning
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407
  • [7] Robust optimal tracking control for multiplayer systems by off-policy Q-learning approach
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    Cao, Jiangtao
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (01) : 87 - 106
  • [8] Data-Efficient Constrained Learning for Optimal Tracking of Batch Processes
    Zhou, Yuanqiang
    Gao, Kaihua
    Li, Dewei
    Xu, Zuhua
    Gao, Furong
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2021, 60 (43) : 15658 - 15668
  • [9] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    Wei Qing-Lai
    Song Rui-Zhuo
    Sun Qiu-Ye
    Xiao Wen-Dong
    CHINESE PHYSICS B, 2015, 24 (09)
  • [10] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    魏庆来
    宋睿卓
    孙秋野
    肖文栋
    Chinese Physics B, 2015, 24 (09) : 151 - 156