Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

被引:0
|
作者
Sun, Shaoqi [1 ]
Xu, Kele [1 ]
机构
[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha, Peoples R China
关键词
D O I
10.1109/IJCNN54540.2023.10191420
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent reinforcement learning (MARL) has shown promising results in many challenging sequential decision-making tasks. Recently, deep neural networks have dominated this field. However, the policy networks of agent's may fall into local optimum during the training phase, which severely constrains the performance of exploration. To address this issue, we propose a novel MARL learning framework named PSAM, which contains a new temporal inconsistency-based intrinsic reward and a diversity control strategy. Specifically, we save the parameters of the deep models along the optimization path of the agent's policy network, which can be denoted as snapshots. Through measuring the difference between snapshots, we can employ the difference as an intrinsic reward. Moreover, we propose a diversity control strategy to improve the performance further. Finally, to verify the effectiveness of the proposed method, we conduct extensive experiments in several widely used MARL environments. The results show that in many environments, PSAM can not only achieve state-of-the-art performance and prevent the policy network from getting stuck in local minima but also accelerate the agent's learning of the policy. It is worth noting that the proposed regularizer can be used using a plug-and-play manner without introducing any additional hyper-parameters and training costs.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
    Mannion, Patrick
    Devlin, Sam
    Duggan, Jim
    Howley, Enda
    KNOWLEDGE ENGINEERING REVIEW, 2018, 33
  • [22] Decentralized graph-based multi-agent reinforcement learning using reward machines
    Hu, Jueming
    Xu, Zhe
    Wang, Weichang
    Qu, Guannan
    Pang, Yutian
    Liu, Yongming
    NEUROCOMPUTING, 2024, 564
  • [23] Multi-agent reinforcement learning based on self-satisfaction in sparse reward scenarios
    Fang, Baofu
    Tang, Dandan
    Wang, Zaijun
    Wang, Hao
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2025, 25 (01)
  • [24] Reward-Filtering-Based Credit Assignment for Multi-Agent Deep Reinforcement Learning
    Xu C.
    Yin N.
    Duan S.-H.
    He H.
    Wang R.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (11): : 2306 - 2320
  • [25] Reward-Poisoning Attacks on Offline Multi-Agent Reinforcement Learning
    Wu, Young
    McMahan, Jeremy
    Zhu, Xiaojin
    Xie, Qiaomin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10426 - 10434
  • [26] Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic
    Zhu, Chenyang
    Zhu, Jinyu
    Si, Wen
    Wang, Xueyuan
    Wang, Fang
    KNOWLEDGE-BASED SYSTEMS, 2024, 306
  • [27] Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning
    Hu, Jifeng
    Sun, Yanchao
    Chen, Hechang
    Huang, Sili
    Piao, Haiyin
    Chang, Yi
    Sun, Lichao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Reward design for driver repositioning using multi-agent reinforcement learning
    Shou, Zhenyu
    Di, Xuan
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 119
  • [29] Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward
    Sheikh, Hassam Ullah
    Boloni, Ladislau
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [30] Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
    Qu, Guannan
    Lin, Yiheng
    Wierman, Adam
    Li, Na
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33