Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

被引:0
|
作者
Zeng, Siliang [1 ]
Chen, Tianyi [2 ]
Garcia, Alfredo [3 ]
Hong, Mingyi [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12181 USA
[3] Texas A&M Univ, Dept Ind & Syst Engn, College Stn, TX 77843 USA
关键词
Multi-Agent Reinforcement Learning; Actor-Critic; Parameter Sharing; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
(1)Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic (CAC) algorithms in which individually parametrized policies have a shared part (which is jointly optimized among all agents) and a personalized part (which is only locally optimized). Such a kind of partially personalized policy allows agents to coordinate by leveraging peers' experience and adapt to individual tasks. The flexibility in our design allows the proposed CAC algorithm to be used in a fully decentralized setting, where the agents can only communicate with their neighbors, as well as in a federated setting, where the agents occasionally communicate with a server while optimizing their (partially personalized) local models. Theoretically, we show that under some standard regularity assumptions, the proposed CAC algorithm requires O(epsilon-5/2) samples to achieve an epsilon-stationary solution (defined as the solution whose squared norm of the gradient of the objective function is less than epsilon). To the best of our knowledge, this work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Bi-level Multi-Agent Actor-Critic Methods with Transformers
    Wan, Tianjiao
    Mi, Haibo
    Gao, Zijian
    Zhai, Yuanzhao
    Ding, Bo
    Feng, Dawei
    2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, : 9 - 16
  • [42] Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
    Lowe, Ryan
    Wu, Yi
    Tamar, Aviv
    Harb, Jean
    Abbeel, Pieter
    Mordatch, Igor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [43] Multi-Agent Actor-Critic for Cooperative Resource Allocation in Vehicular Networks
    Hammami, Nessrine
    Nguyen, Kim Khoa
    PROCEEDINGS OF THE 2022 14TH IFIP WIRELESS AND MOBILE NETWORKING CONFERENCE (WMNC 2022), 2022, : 93 - 100
  • [44] Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic
    Zhang, Junyu
    Bedi, Amrit Singh
    Wang, Mengdi
    Koppel, Alec
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9031 - 9039
  • [45] Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games
    Hao, Dong
    Zhang, Dongcheng
    Shi, Qi
    Li, Kai
    INFORMATION SCIENCES, 2022, 617 : 17 - 40
  • [46] Accelerating Fuzzy Actor-Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
    Wang, Xiao
    Ma, Zhe
    Mao, Lei
    Sun, Kewu
    Huang, Xuhui
    Fan, Changchao
    Li, Jiake
    ELECTRONICS, 2023, 12 (08)
  • [47] Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games
    Hao, Dong
    Zhang, Dongcheng
    Shi, Qi
    Li, Kai
    Information Sciences, 2022, 617 : 17 - 40
  • [48] Finite-time Consensus Algorithm of Multi-agent Networks
    Khoo, Suiyang
    Xie, Lihua
    Yu, Zhong
    Man, Zhihong
    2008 10TH INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION ROBOTICS & VISION: ICARV 2008, VOLS 1-4, 2008, : 916 - +
  • [49] Multi-Microgrid Collaborative Optimization Scheduling Using an Improved Multi-Agent Soft Actor-Critic Algorithm
    Gao, Jiankai
    Li, Yang
    Wang, Bin
    Wu, Haibo
    ENERGIES, 2023, 16 (07)
  • [50] Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization
    Karn, Sanjeev Kumar
    Liu, Ning
    Schuetze, Hinrich
    Farri, Oladimeji
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1542 - 1553