Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引：1

作者：

Zhang, Bo-Kun ^{[1
]}

Hu, Bin ^{[1
]}

Chen, Long ^{[1
]}

Zhang, Ding-Xue ^{[2
]}

Cheng, Xin-Ming ^{[3
]}

Guan, Zhi-Hong ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China

[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China

来源：

PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年

关键词：

Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;

D O I：

10.1109/CCDC52312.2021.9601771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.

引用

页码：3352 / 3357

页数：6

共 50 条

[21] Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning
Sun, Shaoqi
Xu, Kele
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[22] Reward design for multi-agent reinforcement learning with a penalty based on the payment mechanism
Matsunami N.
Okuhara S.
Ito T.
Transactions of the Japanese Society for Artificial Intelligence, 2021, 36 (05)
[23] LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Du, Yali
Han, Lei
Fang, Meng
Dai, Tianhong
Liu, Ji
Tao, Dacheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[24] Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning
de Souza, Cristino, Jr.
Newbury, Rhys
Cosgun, Akansel
Castillo, Pedro
Vidolov, Boris
Kulic, Dana
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03): : 4552 - 4559
[25] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
Shao, Kun
Zhu, Yuanheng
Tang, Zhentao
Zhao, Dongbin
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[26] Learning-Based Metareasoning for Decision Making in Multi-Agent Pursuit-Evasion Games
Namala, Prannoy
Herrmann, Jeffrey W.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
[27] Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
Mannion, Patrick
Devlin, Sam
Duggan, Jim
Howley, Enda
KNOWLEDGE ENGINEERING REVIEW, 2018, 33
[28] Impairment of Probabilistic Reward-Based Learning in Schizophrenia
Weiler, Julia A.
Bellebaum, Christian
Bruene, Martin
Juckel, Georg
Daum, Irene
NEUROPSYCHOLOGY, 2009, 23 (05) : 571 - 580
[29] Multi-Agent Pursuit-Evasion Game Based on Organizational Architecture
Souidi M.E.H.
Siam A.
Pei Z.
Piao S.
Journal of Computing and Information Technology, 2019, 27 (01) : 1 - 12
[30] Using Cognitive Behavioral Learning in Multi-Agent Pursuit-Evasion Game
Kuo, Jong Yih
Liu, Chien-Hung
Lee, Fang-Wen
ASIA MODELLING SYMPOSIUM 2014 (AMS 2014), 2014, : 16 - 20

← 1 2 3 4 5 →