Efficient off-policy Q-learning for multi-agent systems by solving dual games

被引:1
|
作者
Wang, Yan [1 ]
Xue, Huiwen [1 ]
Wen, Jiwei [1 ,3 ]
Liu, Jinfeng [2 ]
Luan, Xiaoli [1 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi, Peoples R China
[2] Univ Alberta, Dept Chem & Mat Engn, Edmonton, AB, Canada
[3] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国国家自然科学基金;
关键词
dual games; momentum policy gradient; multi-agent systems; off-policy; OPTIMAL CONSENSUS CONTROL; SYNCHRONIZATION;
D O I
10.1002/rnc.7189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2$$ {l}_2 $$-bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
引用
收藏
页码:4193 / 4212
页数:20
相关论文
共 50 条
  • [41] Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Ding, Zhengtao
    Jiang, Yi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1308 - 1320
  • [42] Seeking Nash Equilibrium for Linear Discrete-time Systems via Off-policy Q-learning
    Ni, Haohan
    Ji, Yuxiang
    Yang, Yuxiao
    Zhou, Jianping
    IAENG International Journal of Applied Mathematics, 2024, 54 (11) : 2477 - 2483
  • [43] Using Fuzzy Logic and Q-Learning for Trust Modeling in Multi-agent Systems
    Aref, Abdullah
    Tran, Thomas
    FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 2014, 2 : 59 - 66
  • [44] Distributed Consensus-Based Multi-Agent Off-Policy Temporal-Difference Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 5976 - 5981
  • [45] Adaptive Optimal Control via Q-Learning for Multi-Agent Pursuit-Evasion Games
    Dong, Xu
    Zhang, Huaguang
    Ming, Zhongyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (06) : 3056 - 3060
  • [46] Adaptive optimal consensus of nonlinear multi-agent systems with unknown dynamics using off-policy integral reinforcement learning
    Yan, Lei
    Liu, Zhi
    Chen, C. L. Philip
    Zhang, Yun
    Wu, Zongze
    NEUROCOMPUTING, 2025, 621
  • [47] An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control
    Guo, Delin
    Tang, Lan
    Zhang, Xinggan
    Liang, Ying-chang
    NEURAL NETWORKS, 2024, 170 : 610 - 621
  • [48] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
    Stankovic, Milos S.
    Beko, Marko
    Ilic, Nemanja
    Stankovic, Srdjan S.
    EUROPEAN JOURNAL OF CONTROL, 2023, 74
  • [49] The acquisition of sociality by using Q-learning in a multi-agent environment
    Nagayuki, Yasuo
    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 820 - 823
  • [50] Multi-Agent Q-Learning with Joint State Value Approximation
    Chen Gang
    Cao Weihua
    Chen Xin
    Wu Min
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 4878 - 4882