SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

被引:0
|
作者
Wen, Chao [1 ]
Yao, Xinghu [1 ]
Wang, Yuhui [1 ]
Tan, Xiaoyang [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, MIIT Key Lab Pattern Anal & Machine Intelligence, Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 211106, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents a sample efficient and effective value-based method, named SMIX(lambda), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the lambda-return to balance the trade-off between bias and variance and to deal with the environment's non-Markovian property, and 3) adopting an experience-replay style off-policy training. Interestingly, it is revealed that there exists inherent connection between SMIX(lambda) and previous off-policy Q(lambda) approach for single-agent learning. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark show that the proposed SMIX(lambda) algorithm outperforms several state-of-the-art MARL methods by a large margin, and that it can be used as a general tool to improve the overall performance of a CTDE-type method by enhancing the evaluation quality of its CVF. We open-source our code at: https://github.com/chaovven/SMIX.
引用
收藏
页码:7301 / 7308
页数:8
相关论文
共 50 条
  • [21] Cooperative Multi-agent Reinforcement Learning for Inventory Management
    Khirwar, Madhav
    Gurumoorthy, Karthik S.
    Jain, Ankit Ajit
    Manchenahally, Shantala
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VI, 2023, 14174 : 619 - 634
  • [22] A review of cooperative multi-agent deep reinforcement learning
    Afshin Oroojlooy
    Davood Hajinezhad
    Applied Intelligence, 2023, 53 : 13677 - 13722
  • [23] A review of cooperative multi-agent deep reinforcement learning
    Oroojlooy, Afshin
    Hajinezhad, Davood
    APPLIED INTELLIGENCE, 2023, 53 (11) : 13677 - 13722
  • [24] Cooperative multi-agent game based on reinforcement learning
    Liu, Hongbo
    HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
  • [25] Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
    Liu, Iou-Jen
    Jain, Unnat
    Yeh, Raymond A.
    Schwing, Alexander G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [26] Reinforcement learning of coordination in cooperative multi-agent systems
    Kapetanakis, S
    Kudenko, D
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 326 - 331
  • [27] Training Cooperative Agents for Multi-Agent Reinforcement Learning
    Bhalla, Sushrut
    Subramanian, Sriram G.
    Crowley, Mark
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1826 - 1828
  • [28] A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning
    Sun, Wei-Fang
    Lee, Cheng-Kuang
    See, Simon
    Lee, Chun-Yi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [29] Improved Cooperative Multi-agent Reinforcement Learning Algorithm Augmented by Mixing Demonstrations from Centralized Policy
    Lee, Hyun-Rok
    Lee, Taesik
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1089 - 1098
  • [30] Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning
    Jacopo Castellini
    Frans A. Oliehoek
    Rahul Savani
    Shimon Whiteson
    Autonomous Agents and Multi-Agent Systems, 2021, 35