Efficient off-policy Q-learning for multi-agent systems by solving dual games

被引:1
|
作者
Wang, Yan [1 ]
Xue, Huiwen [1 ]
Wen, Jiwei [1 ,3 ]
Liu, Jinfeng [2 ]
Luan, Xiaoli [1 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi, Peoples R China
[2] Univ Alberta, Dept Chem & Mat Engn, Edmonton, AB, Canada
[3] Jiangnan Univ, Sch Internet Things Engn, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
基金
中国国家自然科学基金;
关键词
dual games; momentum policy gradient; multi-agent systems; off-policy; OPTIMAL CONSENSUS CONTROL; SYNCHRONIZATION;
D O I
10.1002/rnc.7189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2$$ {l}_2 $$-bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
引用
收藏
页码:4193 / 4212
页数:20
相关论文
共 50 条
  • [1] Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    IEEE ACCESS, 2019, 7 : 134647 - 134659
  • [2] Optimal Control for Multi-agent Systems Using Off-Policy Reinforcement Learning
    Wang, Hao
    Chen, Zhiru
    Wang, Jun
    Lu, Lijun
    Li, Mingzhe
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 135 - 140
  • [3] Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
    Bighashdel, Ariyan
    de Geus, Daan
    Jancura, Pavol
    Dubbelman, Gijs
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [4] Q-Learning with Side Information in Multi-Agent Finite Games
    Sylvestre, Mathieu
    Pavel, Lacra
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5032 - 5037
  • [5] Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state
    Li, Jinna
    Xiao, Zhenfei
    Fan, Jialu
    Chai, Tianyou
    Lewis, Frank L. L.
    AUTOMATICA, 2022, 136
  • [6] Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems
    Li, Jinna
    Xiao, Zhenfei
    Chai, Tianyou
    Lewis, Frank L.
    Jagannathan, Sarangapani
    IFAC PAPERSONLINE, 2020, 53 (02): : 9189 - 9194
  • [7] Off-policy Reinforcement Learning for Distributed Output Synchronization of Linear Multi-agent Systems
    Kiumarsi, Bahare
    Lewis, Frank L.
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1877 - 1884
  • [8] Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems
    Chen, Ci
    Lewis, Frank L.
    Xie, Kan
    Xie, Shengli
    Liu, Yilu
    AUTOMATICA, 2020, 119
  • [9] Off-policy Q-learning: Optimal tracking control for networked control systems
    Li J.-N.
    Yin Z.-X.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (11): : 2343 - 2349
  • [10] Q-learning in Multi-Agent Cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 239 - 244