Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

被引:0
|
作者
Guan, Cong [1 ,2 ]
Chen, Feng [1 ,2 ]
Yuan, Lei [3 ]
Zhang, Zongzhang [1 ,2 ]
Yu, Yang [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Polixir Technol, Nanjing 211106, Peoples R China
基金
美国国家科学基金会;
关键词
Benchmark testing; Reinforcement learning; Observability; Training; Learning (artificial intelligence); Decision making; Data mining; Cooperative multiagent reinforcement learning (MARL); multiagent communication; offline learning; representation learning;
D O I
10.1109/TNNLS.2024.3420791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilizing messages from teammates can improve coordination in cooperative multiagent reinforcement learning (MARL). Previous works typically combine raw messages of teammates with local information as inputs for policy. However, neglecting message aggregation poses significant inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in cooperative MARL. In this article, we propose Multiagent communication via Self-supervised Information Aggregation (MASIA), where agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation-invariant message encoder to generate common information-aggregated representation from messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Hence, each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Furthermore, considering the potential of offline learning for real-world applications, we build offline benchmarks for multiagent communication, which is the first as we know. Empirical results demonstrate the superiority of our method in both online and offline settings. We also release the built offline benchmarks in this article as a testbed for communication ability validation to facilitate further future research in this direction.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Intrinsically Motivated Self-supervised Learning in Reinforcement Learning
    Zhao, Yue
    Du, Chenzhuang
    Zhao, Hang
    Li, Tiejun
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3605 - 3615
  • [12] Self-Supervised Reinforcement Learning for Recommender Systems
    Xin, Xin
    Karatzoglou, Alexandros
    Arapakis, Ioannis
    Jose, Joemon M.
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 931 - 940
  • [13] Learning online visual invariances for novel objects via supervised and self-supervised training
    Biscione, Valerio
    Bowers, Jeffrey S.
    NEURAL NETWORKS, 2022, 150 : 222 - 236
  • [14] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [15] PointSmile: point self-supervised learning via curriculum mutual information
    Xin LI
    Mingqiang WEI
    Songcan CHEN
    Science China(Information Sciences), 2024, 67 (11) : 121 - 135
  • [16] PointSmile: point self-supervised learning via curriculum mutual information
    Li, Xin
    Wei, Mingqiang
    Chen, Songcan
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (11)
  • [17] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [18] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
    Manchin, Anthony
    Abbasnejad, Ehsan
    van den Hengel, Anton
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
  • [19] Self-Supervised Reinforcement Learning for Active Object Detection
    Fang, Fen
    Liang, Wenyu
    Wu, Yan
    Xu, Qianli
    Lim, Joo-Hwee
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10224 - 10231
  • [20] Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
    Shi, Wenjie
    Huang, Gao
    Song, Shiji
    Wang, Zhuoyuan
    Lin, Tingyu
    Wu, Cheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2712 - 2724