Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

被引:0
|
作者
Zeng, Siliang [1 ]
Chen, Tianyi [2 ]
Garcia, Alfredo [3 ]
Hong, Mingyi [1 ]
机构
[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12181 USA
[3] Texas A&M Univ, Dept Ind & Syst Engn, College Stn, TX 77843 USA
关键词
Multi-Agent Reinforcement Learning; Actor-Critic; Parameter Sharing; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
(1)Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic (CAC) algorithms in which individually parametrized policies have a shared part (which is jointly optimized among all agents) and a personalized part (which is only locally optimized). Such a kind of partially personalized policy allows agents to coordinate by leveraging peers' experience and adapt to individual tasks. The flexibility in our design allows the proposed CAC algorithm to be used in a fully decentralized setting, where the agents can only communicate with their neighbors, as well as in a federated setting, where the agents occasionally communicate with a server while optimizing their (partially personalized) local models. Theoretically, we show that under some standard regularity assumptions, the proposed CAC algorithm requires O(epsilon-5/2) samples to achieve an epsilon-stationary solution (defined as the solution whose squared norm of the gradient of the objective function is less than epsilon). To the best of our knowledge, this work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] On Finite-Time Convergence of Actor-Critic Algorithm
    Qiu S.
    Yang Z.
    Ye J.
    Wang Z.
    IEEE Journal on Selected Areas in Information Theory, 2021, 2 (02): : 652 - 664
  • [2] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
    Xiao, Yuchen
    Tan, Weihao
    Amato, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
    Paczolay, Gabor
    Harmati, Istvan
    2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [4] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [5] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [6] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [8] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [9] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [10] Multi-agent actor-critic with time dynamical opponent model
    Tian, Yuan
    Kladny, Klaus -Rudolf
    Wang, Qin
    Huang, Zhiwu
    Fink, Olga
    NEUROCOMPUTING, 2023, 517 : 165 - 172