Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

被引:0
|
作者
Guan, Cong [1 ,2 ]
Chen, Feng [1 ,2 ]
Yuan, Lei [3 ]
Zhang, Zongzhang [1 ,2 ]
Yu, Yang [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Polixir Technol, Nanjing 211106, Peoples R China
基金
美国国家科学基金会;
关键词
Benchmark testing; Reinforcement learning; Observability; Training; Learning (artificial intelligence); Decision making; Data mining; Cooperative multiagent reinforcement learning (MARL); multiagent communication; offline learning; representation learning;
D O I
10.1109/TNNLS.2024.3420791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilizing messages from teammates can improve coordination in cooperative multiagent reinforcement learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this article, we propose Multiagent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation-invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multiagent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this article as a testbed for communication ability validation to facilitate further future research in this direction.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Learning Aerial Docking via Offline-to-Online Reinforcement Learning
    Tao, Yang
    Feng Yuting
    Yu, Yushu
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 305 - 309
  • [32] Multi-task Self-Supervised Adaptation for Reinforcement Learning
    Wu, Keyu
    Chen, Zhenghua
    Wu, Min
    Xiang, Shili
    Jin, Ruibing
    Zhang, Le
    Li, Xiaoli
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 15 - 20
  • [33] Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
    Bai, Chenjia
    Liu, Peng
    Liu, Kaiyu
    Wang, Lingxiao
    Zhao, Yingnan
    Han, Lei
    Wang, Zhaoran
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4776 - 4790
  • [34] ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning
    Wang, Yufei
    Narasimhan, Gautham Narayan
    Lin, Xingyu
    Okorn, Brian
    Held, David
    CONFERENCE ON ROBOT LEARNING, VOL 155, 2020, 155 : 1030 - 1048
  • [35] Self-Supervised Representations for Multi-View Reinforcement Learning
    Yang, Huanhuan
    Shi, Dianxi
    Xie, Guojun
    Peng, Yingxuan
    Zhang, Yi
    Yang, Yantai
    Yang, Shaowu
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2203 - 2213
  • [36] Self-Supervised Reinforcement Learning that Transfers using Random Features
    Chen, Boyuan
    Zhu, Chuning
    Agrawal, Pulkit
    Zhang, Kaiqing
    Gupta, Abhishek
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Self-Supervised Reinforcement Learning for Proactive Prediction of Passive Intermodulation
    Banerjee, Serene
    Uppuluri, Pratyush Kiran
    Sharma, Rahul N.
    Bandyopadhyay, Subhadip
    2023 15TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS, COMSNETS, 2023,
  • [38] Self-supervised learning for efficient seismic facies classification
    Chikhaoui, Khalil
    Alfarraj, Motaz
    GEOPHYSICS, 2024, 89 (05) : IM61 - IM76
  • [39] Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning
    Lepage, Theo
    Dehak, Reda
    INTERSPEECH 2022, 2022, : 4018 - 4022
  • [40] Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
    Mezghani, Lina
    Sukhbaatar, Sainbayar
    Bojanowski, Piotr
    Lazaric, Alessandro
    Alahari, Karteek
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1401 - 1410