Optimistic sequential multi-agent reinforcement learning with motivational communication

被引：0

作者：

Huang, Anqi ^{[1
]}

Wang, Yongli ^{[1
]}

Zhou, Xiaoliang ^{[1
]}

Zou, Haochen ^{[1
]}

Dong, Xu ^{[1
]}

Che, Xun ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 179卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent reinforcement learning; Policy gradient; Motivational communication; Reinforcement learning; Multi-agent system;

D O I：

10.1016/j.neunet.2024.106547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called O ptimistic S equential S oft Actor Critic with M otivational C ommunication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

引用

页数：12

共 50 条

[21] Multi-Agent Reinforcement Learning
Stankovic, Milos
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[22] Diffusion-based Multi-agent Reinforcement Learning with Communication
Qi, Xinyue
Tang, Jianhang
Jin, Jiangming
Zhang, Yang
2024 IEEE VTS ASIA PACIFIC WIRELESS COMMUNICATIONS SYMPOSIUM, APWCS 2024, 2024,
[23] Cooperative Behavior by Multi-agent Reinforcement Learning with Abstractive Communication
Tanda, Jin
Moustafa, Ahmed
Ito, Takayuki
2019 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), 2019, : 8 - 13
[24] Multi-agent Pathfinding with Communication Reinforcement Learning and Deadlock Detection
Ye, Zhaohui
Li, Yanjie
Guo, Ronghao
Gao, Jianqi
Fu, Wen
INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT I, 2022, 13455 : 493 - 504
[25] Semantic Communication for Partial Observation Multi-agent Reinforcement Learning
Do, Hoang Khoi
Dinh, Thi Quynh
Nguyen, Minh Duong
Nguyen, Tien Hoa
2023 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP, SSP, 2023, : 319 - 323
[26] Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Liu, Zeyang
Wan, Lipeng
Sui, Xue
Chen, Zhuoran
Sun, Kewu
Lan, Xuguang
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 208 - 216
[27] Cooperative Multi-agent Reinforcement Learning with Hierachical Communication Architecture
Liu, Shifan
Yuan, Quan
Chen, Bo
Luo, Guiyang
Li, Jinglin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 14 - 25
[28] Communication-Efficient and Federated Multi-Agent Reinforcement Learning
Krouka, Mounssif
Elgabli, Anis
Ben Issaid, Chaouki
Bennis, Mehdi
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (01) : 311 - 320
[29] A sequential multi-agent reinforcement learning framework for different action spaces
Tian, Shucong
Yang, Meng
Xiong, Rongling
He, Xingxing
Rajasegarar, Sutharshan
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
[30] DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning
Yuan, Tingting
Chung, Hwei-Ming
Yuan, Jie
Fu, Xiaoming
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11763 - 11771

← 1 2 3 4 5 →