SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

被引：0

作者：

Wen, Chao ^{[1
]}

Yao, Xinghu ^{[1
]}

Wang, Yuhui ^{[1
]}

Tan, Xiaoyang ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, MIIT Key Lab Pattern Anal & Machine Intelligence, Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 211106, Peoples R China

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work presents a sample efficient and effective value-based method, named SMIX(lambda), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the lambda-return to balance the trade-off between bias and variance and to deal with the environment's non-Markovian property, and 3) adopting an experience-replay style off-policy training. Interestingly, it is revealed that there exists inherent connection between SMIX(lambda) and previous off-policy Q(lambda) approach for single-agent learning. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark show that the proposed SMIX(lambda) algorithm outperforms several state-of-the-art MARL methods by a large margin, and that it can be used as a general tool to improve the overall performance of a CTDE-type method by enhancing the evaluation quality of its CVF. We open-source our code at: https://github.com/chaovven/SMIX.

引用

页码：7301 / 7308

页数：8

共 50 条

[31] Privacy-Engineered Value Decomposition Networks for Cooperative Multi-Agent Reinforcement Learning
Gohari, Parham
Hale, Matthew
Topcu, Ufuk
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 8038 - 8044
[32] Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning
Castellini, Jacopo
Oliehoek, Frans A.
Savani, Rahul
Whiteson, Shimon
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2021, 35 (02)
[33] QDN: An Efficient Value Decomposition Method for Cooperative Multi-agent Deep Reinforcement Learning
Xie, Zaipeng
Zhang, Yufeng
Shao, Pengfei
Zhao, Weiyi
2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1204 - 1211
[34] Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning
Zohar, Roy
Mannor, Shie
Tennenholtz, Guy
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9278 - 9285
[35] QVF: Incorporating quantile value function factorization into cooperative multi-agent reinforcement learning
Huang, Anqi
Wang, Yongli
Liu, Ruoze
Zou, Haochen
Zhou, Xiaoliang
PATTERN RECOGNITION, 2025, 161
[36] CLlight: Enhancing representation of multi-agent reinforcement learning with contrastive learning for cooperative traffic signal control
Fu, Xiang
Ren, Yilong
Jiang, Han
Lv, Jiancheng
Cui, Zhiyong
Yu, Haiyang
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
[37] Enhancing cooperative multi-agent reinforcement learning through the integration of R-STDP and federated learning
Ramezanlou, Mohammad Tayefe
Schwartz, Howard
Lambadaris, Ioannis
Barbeau, Michel
NEUROCOMPUTING, 2025, 617
[38] A centralized reinforcement learning method for multi-agent job scheduling in Grid
Moradi, Milad
2016 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2016, : 171 - 176
[39] Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning
Zhou, Pingqi
Li, Chao
Qiu, Mengwei
Liu, Jun
Ma, Chennan
Yan, Ming
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 115 - 126
[40] Learning Distinct Strategies for Heterogeneous Cooperative Multi-agent Reinforcement Learning
Wan, Kejia
Xu, Xinhai
Li, Yuan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 544 - 555

← 1 2 3 4 5 →