Efficient off-policy Q-learning for multi-agent systems by solving dual games

被引:1
|
作者
Wang, Yan [1 ]
Xue, Huiwen [1 ]
Wen, Jiwei [1 ,3 ]
Liu, Jinfeng [2 ]
Luan, Xiaoli [1 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi, Peoples R China
[2] Univ Alberta, Dept Chem & Mat Engn, Edmonton, AB, Canada
[3] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国国家自然科学基金;
关键词
dual games; momentum policy gradient; multi-agent systems; off-policy; OPTIMAL CONSENSUS CONTROL; SYNCHRONIZATION;
D O I
10.1002/rnc.7189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2$$ {l}_2 $$-bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
引用
收藏
页码:4193 / 4212
页数:20
相关论文
共 50 条
  • [21] Off-Policy Learning for Bipartite Output Regulation of Heterogeneous Multi-Agent Systems under Actuator Faults
    Zhou, Yan
    Wen, Guanghui
    2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1388 - 1393
  • [22] Continuous Q-Learning for Multi-Agent Cooperation
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Lin, Yu-Hong
    Lai, Li-Hsin
    CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
  • [23] Untangling Braids with Multi-Agent Q-Learning
    Khan, Abdullah
    Vernitski, Alexei
    Lisitsa, Alexei
    2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, : 135 - 139
  • [24] Q-learning with FCMAC in multi-agent cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 599 - 606
  • [25] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
    Kumar, Aviral
    Fu, Justin
    Tucker, George
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [26] An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems
    Alsalti, Mohammad
    Lopez, Victor G.
    Mueller, Matthias A.
    6TH ANNUAL LEARNING FOR DYNAMICS & CONTROL CONFERENCE, 2024, 242 : 312 - 323
  • [27] Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems
    Akchurina, Natalia
    MULTIAGENT SYSTEM TECHNOLOGIES, PROCEEDINGS, 2008, 5244 : 13 - 24
  • [28] A novel multi-agent Q-learning algorithm in cooperative multi-agent system
    Ou, HT
    Zhang, WD
    Zhang, WY
    Xu, XM
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 272 - 276
  • [29] A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
    Suttle, Wesley
    Yang, Zhuoran
    Zhang, Kaiqing
    Wang, Zhaoran
    Basar, Tamer
    Liu, Ji
    IFAC PAPERSONLINE, 2020, 53 (02): : 1549 - 1554
  • [30] Optimal Control for Interconnected Multi-Area Power Systems With Unknown Dynamics: An Off-Policy Q-Learning Method
    Wang, Jing
    Mi, Xuanrui
    Shen, Hao
    Park, Ju H.
    Shi, Kaibo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (05) : 2849 - 2853