Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

被引：0

作者：

Sun, Shaoqi ^{[1
]}

Xu, Kele ^{[1
]}

机构：

[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Proc, Changsha, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

关键词：

D O I：

10.1109/IJCNN54540.2023.10191420

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent reinforcement learning (MARL) has shown promising results in many challenging sequential decision-making tasks. Recently, deep neural networks have dominated this field. However, the policy networks of agent's may fall into local optimum during the training phase, which severely constrains the performance of exploration. To address this issue, we propose a novel MARL learning framework named PSAM, which contains a new temporal inconsistency-based intrinsic reward and a diversity control strategy. Specifically, we save the parameters of the deep models along the optimization path of the agent's policy network, which can be denoted as snapshots. Through measuring the difference between snapshots, we can employ the difference as an intrinsic reward. Moreover, we propose a diversity control strategy to improve the performance further. Finally, to verify the effectiveness of the proposed method, we conduct extensive experiments in several widely used MARL environments. The results show that in many environments, PSAM can not only achieve state-of-the-art performance and prevent the policy network from getting stuck in local minima but also accelerate the agent's learning of the policy. It is worth noting that the proposed regularizer can be used using a plug-and-play manner without introducing any additional hyper-parameters and training costs.

引用

页数：7

共 50 条

[1] LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Du, Yali
Han, Lei
Fang, Meng
Dai, Tianhong
Liu, Ji
Tao, Dacheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[2] Intrinsic Reward with Peer Incentives for Cooperative Multi-Agent Reinforcement Learning
Zhang, Tianle
Liu, Zhen
Wu, Shiguang
Pu, Zhiqiang
Yi, Jianqiang
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[3] Multi-Agent Reinforcement Learning with Reward Delays
Zhang, Yuyang
Zhang, Runyu
Gu, Yuantao
Li, Na
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[4] Direct reward and indirect reward in multi-agent reinforcement learning
Ohta, M
ROBOCUP 2002: ROBOT SOCCER WORLD CUP VI, 2003, 2752 : 359 - 366
[5] Direct reward and indirect reward in multi-agent reinforcement learning
Ohta, M. (ohta@carc.aist.go.jp), (Springer Verlag):
[6] LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning
Chen, Zihan
Luo, Biao
Hu, Tianmeng
Xu, Xiaodong
NEURAL NETWORKS, 2023, 167 : 450 - 459
[7] Plan-based reward shaping for multi-agent reinforcement learning
Devlin, Sam
Kudenko, Daniel
KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 44 - 58
[8] Rationality of reward sharing in multi-agent reinforcement learning
Kazuteru Miyazaki
Shigenobu Kobayashi
New Generation Computing, 2001, 19 : 157 - 172
[9] Rationality of reward sharing in multi-agent reinforcement learning
Miyazaki, K
Kobayashi, S
NEW GENERATION COMPUTING, 2001, 19 (02) : 157 - 172
[10] Individual Reward Assisted Multi-Agent Reinforcement Learning
Wang, Li
Zhang, Yupeng
Hu, Yujing
Wang, Weixun
Zhang, Chongjie
Gao, Yang
Hao, Jianye
Lv, Tangjie
Fan, Changjie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →