Intelligent cooperative exploration path planning for UAV swarm in an unknown environment

被引：0

作者：

Wang W. ^{[1
]}

You M. ^{[2
]}

Sun L. ^{[3
]}

Zhang X. ^{[1
]}

Zong Q. ^{[1
]}

机构：

[1] School of Electrical and Information Engineering, Tianjin University, Tianjin

[2] Shenyang Aircraft Design and Research Institute, Shenyang

[3] Institute of Army Aviation, Beijing

来源：

Gongcheng Kexue Xuebao/Chinese Journal of Engineering | 2024年 / 46卷 / 07期

关键词：

automatic exploration; deep reinforcement learning; path planning; swarm; unmanned aerial vehicle;

D O I：

10.13374/j.issn2095-9389.2023.10.15.002

中图分类号：

学科分类号：

摘要：

Owing to the increasing complexity of task execution and a wide range of variability in environmental conditions, a single unmanned aerial vehicle (UAV) is insufficient to meet practical mission requirements. Multi-UAV systems have vast potential for applications in areas such as search and rescue. During search and rescue missions, UAVs acquire the location of the target to be rescued and subsequently plan a path that circumvents obstacles and leads to the target. Traditional path-planning algorithms require prior knowledge of obstacle distribution on the map, which may be difficult to obtain in real-world missions. To address the issue of traditional path-planning algorithms that rely on prior map information, this paper proposes a reinforcement learning-based approach for the collaborative exploration of multiple UAVs in unknown environments. First, a Markov decision process is employed to establish a game model and task objectives for the UAV cluster, considering the characteristics of collaborative exploration tasks and various constraints of UAV clusters. To maximize the search and rescue success rate, UAVs must satisfy dynamic and obstacle-avoidance constraints during mission execution. Second, a reinforcement learning-based method for the collaborative exploration of multiple UAVs is proposed. The multiagent soft actor–critic (MASAC) algorithm is used to iteratively train the UAVs’ collaborative exploration strategies. The actor network generates UAV actions, while the critic network evaluates the quality of these strategies. To enhance the algorithm’s generalization capability, training is conducted in randomly generated map environments. To avoid UAVs being obstructed by concave obstacles, a breadth-first search algorithm is used to calculate rewards based on the path distance between the UAVs and targets rather than the linear distance. During the exploration process, each UAV continuously collects and shares the map information with all other UAVs. They make individual action decisions based on the environment and information obtained from other UAVs, and the mission is considered successful if multiple UAVs hover above the target. Finally, a virtual simulation platform for algorithm validation is developed using the Unity game engine. The proposed algorithm is implemented using PyTorch, and bidirectional interaction between the Unity environment and Python algorithm is achieved through the ML-Agents (Machine learning agents) framework. Comparative experiments are conducted on the virtual simulation platform to compare the proposed algorithm with a non-cooperative single-agent SAC algorithm. The proposed method exhibits advantages in terms of task success rate, task completion efficiency, and episode rewards, validating the feasibility and effectiveness of the proposed approach. © 2024 Science Press. All rights reserved.

引用

页码：1197 / 1206

页数：9

共 30 条

[1] Peng Y L, Duan H B, Wei C., UAV swarm task allocation algorithm based on the alternating direction method of multipliers network potential game theory, Chin J Eng, 44, 4, (2022)
[2] Tao L, Hong T, Chao X., Drone identification and location tracking based on YOLOv3, Chin J Eng, 42, 4, (2020)
[3] Duan H B, Qiu H X., Unmanned Aerial Vehicle Swarm Autonomous Control Based on Swarm Intelligence, (2018)
[4] Wang R H, Gao X Y, Xiang Z R., Review on the manned/unmanned aerial vehicle cooperative system and key technologies, J Ordnance Equip Eng, 44, 8, (2023)
[5] Fransen K J C, van Eekelen J A W M, Pogromsky A, Et al., A dynamic path planning approach for dense, large, grid-based automated guided vehicle systems, Comput Oper Res, 123, 1, (2020)
[6] Li J A, Zhang W J, Hu Y T, Et al., RJA-star algorithm for UAV path planning based on improved R5DOS model, Appl Sci, 13, 2, (2023)
[7] Qian X, Peng C, Nong C, Et al., Dynamic obstacle avoidance path planning of UAVs, 34th Chinese Control Conference, (2015)
[8] Abeywickrama H V, Jayawickrama B A, He Y, Et al., Potential field based inter-UAV collision avoidance using virtual target relocation, IEEE 87th Vehicular Technology Conference, (2018)
[9] Peng Z H, Sun L, Chen J., Online path planning for UAV low-altitude penetration based on an improved differential evolution algorithm, J Univ Sci Technol Beijing, 34, 1, (2012)
[10] Phung M D, Ha Q P., Safety-enhanced UAV path planning with spherical vector-based particle swarm optimization, Appl Soft Comput, 107, (2021)

← 1 2 3 →