Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees

被引：0

作者：

Zeng, Siliang ^{[1
]}

Chen, Tianyi ^{[2
]}

Garcia, Alfredo ^{[3
]}

Hong, Mingyi ^{[1
]}

机构：

[1] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA

[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12181 USA

[3] Texas A&M Univ, Dept Ind & Syst Engn, College Stn, TX 77843 USA

来源：

LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168 | 2022年 / 168卷

关键词：

Multi-Agent Reinforcement Learning; Actor-Critic; Parameter Sharing; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

(1)Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic (CAC) algorithms in which individually parametrized policies have a shared part (which is jointly optimized among all agents) and a personalized part (which is only locally optimized). Such a kind of partially personalized policy allows agents to coordinate by leveraging peers' experience and adapt to individual tasks. The flexibility in our design allows the proposed CAC algorithm to be used in a fully decentralized setting, where the agents can only communicate with their neighbors, as well as in a federated setting, where the agents occasionally communicate with a server while optimizing their (partially personalized) local models. Theoretically, we show that under some standard regularity assumptions, the proposed CAC algorithm requires O(epsilon-5/2) samples to achieve an epsilon-stationary solution (defined as the solution whose squared norm of the gradient of the objective function is less than epsilon). To the best of our knowledge, this work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.

引用

页数：13

共 50 条

[21] Deployment Algorithm of Service Function Chain Based on Multi-Agent Soft Actor-Critic Learning
Tang, Lun
Li, Shirui
Du, Yucong
Chen, Qianbin
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 2893 - 2901
[22] Divergence-Regularized Multi-Agent Actor-Critic
Su, Kefan
Lu, Zongqing
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[23] Multi-agent Attention Actor-Critic Algorithm for Load Balancing in Cellular Networks
Kang, Jikun
Wu, Di
Wang, Ju
Hossain, Ekram
Liu, Xue
Dedek, Gregory
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 5160 - 5165
[24] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
Lin, Yixuan
Gade, Shripad
Sandhu, Romeil
Liu, Ji
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958
[25] Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
Xiao, Yuchen
Lyu, Xueguang
Amato, Christopher
2021 INTERNATIONAL SYMPOSIUM ON MULTI-ROBOT AND MULTI-AGENT SYSTEMS (MRS), 2021, : 155 - 163
[26] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
Stankovic, Milos S.
Beko, Marko
Ilic, Nemanja
Stankovic, Srdjan S.
EUROPEAN JOURNAL OF CONTROL, 2023, 74
[27] Finite-Time Analysis of Single-Timescale Actor-Critic
Chen, Xuyang
Zhao, Lin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[28] Improving sample efficiency in Multi-Agent Actor-Critic methods
Ye, Zhenhui
Chen, Yining
Jiang, Xiaohong
Song, Guanghua
Yang, Bowei
Fan, Sheng
APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
[29] Multi-Agent Actor-Critic with Hierarchical Graph Attention Network
Ryu, Heechang
Shin, Hayong
Park, Jinkyoo
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7236 - 7243
[30] Improving sample efficiency in Multi-Agent Actor-Critic methods
Zhenhui Ye
Yining Chen
Xiaohong Jiang
Guanghua Song
Bowei Yang
Sheng Fan
Applied Intelligence, 2022, 52 : 3691 - 3704

← 1 2 3 4 5 →