Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

被引:0
|
作者
Guan, Cong [1 ,2 ]
Chen, Feng [1 ,2 ]
Yuan, Lei [3 ]
Zhang, Zongzhang [1 ,2 ]
Yu, Yang [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Polixir Technol, Nanjing 211106, Peoples R China
基金
美国国家科学基金会;
关键词
Benchmark testing; Reinforcement learning; Observability; Training; Learning (artificial intelligence); Decision making; Data mining; Cooperative multiagent reinforcement learning (MARL); multiagent communication; offline learning; representation learning;
D O I
10.1109/TNNLS.2024.3420791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilizing messages from teammates can improve coordination in cooperative multiagent reinforcement learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this article, we propose Multiagent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation-invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multiagent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this article as a testbed for communication ability validation to facilitate further future research in this direction.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Self-Supervised Attention-Aware Reinforcement Learning
    Wu, Haiping
    Khetarpa, Khimya
    Precup, Doina
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10311 - 10319
  • [22] Efficient DDPG via the Self-Supervised Method
    Zhang, Guanghao
    Chen, Hongliang
    Li, Jianxun
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4636 - 4642
  • [23] Re-entry Prediction for Online Conversations via Self-Supervised Learning
    Wang, Lingzhi
    Zeng, Xingshan
    Hu, Huang
    Wong, Kam-Fai
    Jiang, Daxin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2127 - 2137
  • [24] Self-Supervised Learning with an Information Maximization Criterion
    Ozsoy, Serdar
    Hamdan, Shadi
    Arik, Sercan O.
    Yuret, Deniz
    Erdogan, Alper T.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization
    Kong, Rui
    Wu, Chenyang
    Gao, Chen-Xiao
    Zhang, Zongzhang
    Li, Ming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4317 - 4325
  • [26] Online self-supervised learning for dynamic object segmentation
    Guizilini, Vitor
    Ramos, Fabio
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (4-5): : 559 - 581
  • [27] Masked self-supervised ECG representation learning via multiview information bottleneck
    Yang, Shunxiang
    Lian, Cheng
    Zeng, Zhigang
    Xu, Bingrong
    Su, Yixin
    Xue, Chenyang
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (14): : 7625 - 7637
  • [28] Self-supervised learning for heterogeneous graph via structure information based on metapath
    Ma, Shuai
    Liu, Jian-wei
    Zuo, Xin
    APPLIED SOFT COMPUTING, 2023, 143
  • [29] Masked self-supervised ECG representation learning via multiview information bottleneck
    Shunxiang Yang
    Cheng Lian
    Zhigang Zeng
    Bingrong Xu
    Yixin Su
    Chenyang Xue
    Neural Computing and Applications, 2024, 36 : 7625 - 7637
  • [30] M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation
    Lygerakis, Folios
    Dave, Vedant
    Rueckert, Flitiar
    2024 21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR 2024, 2024, : 490 - 497