Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引：1

作者：

Zhang, Bo-Kun ^{[1
]}

Hu, Bin ^{[1
]}

Chen, Long ^{[1
]}

Zhang, Ding-Xue ^{[2
]}

Cheng, Xin-Ming ^{[3
]}

Guan, Zhi-Hong ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China

[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China

来源：

PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年

关键词：

Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;

D O I：

10.1109/CCDC52312.2021.9601771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.

引用

页码：3352 / 3357

页数：6

共 50 条

[41] Reinforcement learning based on multi-agent in RoboCup
Zhang, W
Li, JG
Ruan, XG
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 967 - 975
[42] Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
Dong Y.
Cui T.
Zhou Y.
Song X.
Zhu Y.
Dong P.
Journal of Shanghai Jiaotong University (Science), 2024, 29 (04) : 646 - 655
[43] Multi-Agent Reinforcement Learning
Stankovic, Milos
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[44] Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit
Yu, Chao
Dong, Yinzhao
Li, Yangning
Chen, Yatong
JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 499 - 504
[45] Mobile User Interface Adaptation Based on Usability Reward Model and Multi-Agent Reinforcement Learning
Vidmanov, Dmitry
Alfimtsev, Alexander
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2024, 8 (04)
[46] An Improved Approach towards Multi-Agent Pursuit-Evasion Game Decision-Making Using Deep Reinforcement Learning
Wan, Kaifang
Wu, Dingwei
Zhai, Yiwei
Li, Bo
Gao, Xiaoguang
Hu, Zijian
ENTROPY, 2021, 23 (11)
[47] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Duc Thien Nguyen
Yeoh, William
Lau, Hoong Chuin
Zilberstein, Shlomo
Zhang, Chongjie
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1447 - 1455
[48] Multi-Agent Deep Reinforcement Learning With Progressive Negative Reward for Cryptocurrency Trading
Kumlungmak, Kittiwin
Vateekul, Peerapon
IEEE ACCESS, 2023, 11 : 66440 - 66455
[49] Leaders and Collaborators: Addressing Sparse Reward Challenges in Multi-Agent Reinforcement Learning
Sun, Shaoqi
Liu, Hui
Xu, Kele
Ding, Bo
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[50] Role differentiation process by division of reward function in multi-agent reinforcement learning
Taniguchi, Tadahiro
Tabuchi, Kazuma
Sawaragi, Tetsuo
2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2008, : 358 - +

← 1 2 3 4 5 →