Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

被引:0
|
作者
Sun, Shaoqi [1 ]
Xu, Kele [1 ]
机构
[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha, Peoples R China
关键词
D O I
10.1109/IJCNN54540.2023.10191420
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent reinforcement learning (MARL) has shown promising results in many challenging sequential decision-making tasks. Recently, deep neural networks have dominated this field. However, the policy networks of agent's may fall into local optimum during the training phase, which severely constrains the performance of exploration. To address this issue, we propose a novel MARL learning framework named PSAM, which contains a new temporal inconsistency-based intrinsic reward and a diversity control strategy. Specifically, we save the parameters of the deep models along the optimization path of the agent's policy network, which can be denoted as snapshots. Through measuring the difference between snapshots, we can employ the difference as an intrinsic reward. Moreover, we propose a diversity control strategy to improve the performance further. Finally, to verify the effectiveness of the proposed method, we conduct extensive experiments in several widely used MARL environments. The results show that in many environments, PSAM can not only achieve state-of-the-art performance and prevent the policy network from getting stuck in local minima but also accelerate the agent's learning of the policy. It is worth noting that the proposed regularizer can be used using a plug-and-play manner without introducing any additional hyper-parameters and training costs.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Hoong Chuin Lau
    Zilberstein, Shlomo
    Zhang, Chongjie
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
  • [42] Towards Designing Optimal Reward Functions in Multi-Agent Reinforcement Learning Problems
    Grunitzki, Ricardo
    da Silva, Bruno C.
    Bazzan, Ana L. C.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [43] Inconsistency-Based Multi-Task Cooperative Learning for Emotion Recognition
    Xu, Yifan
    Cui, Yuqi
    Jiang, Xue
    Yin, Yingjie
    Ding, Jingting
    Li, Liang
    Wu, Dongrui
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 2017 - 2027
  • [44] Multi-agent cooperative learning research based on reinforcement learning
    Liu, Fei
    Zeng, Guangzhou
    2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 1408 - 1413
  • [45] ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
    Ma, Zixian
    Wang, Rose E.
    Li Fei-Fei
    Bernstein, Michael
    Krishna, Ranjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] Multi-agent reinforcement learning based on local communication
    Zhang, Wenxu
    Ma, Lei
    Li, Xiaonan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 15357 - 15366
  • [47] Multi-agent Cooperative Search based on Reinforcement Learning
    Sun, Yinjiang
    Zhang, Rui
    Liang, Wenbao
    Xu, Cheng
    PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 891 - 896
  • [48] Hierarchical Multi-Agent Training Based on Reinforcement Learning
    Wang, Guanghua
    Li, Wenjie
    Wu, Zhanghua
    Guo, Xian
    2024 9TH ASIA-PACIFIC CONFERENCE ON INTELLIGENT ROBOT SYSTEMS, ACIRS, 2024, : 11 - 18
  • [49] Function approximation based multi-agent reinforcement learning
    Abul, O
    Polat, F
    Alhajj, R
    12TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, : 36 - 39
  • [50] Multi-agent reinforcement learning based on local communication
    Wenxu Zhang
    Lei Ma
    Xiaonan Li
    Cluster Computing, 2019, 22 : 15357 - 15366