A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

被引：24

作者：

Chai, Jiajun ^{[1
,2
]}

Chen, Wenzhang ^{[1
,2
]}

Zhu, Yuanheng ^{[1
,2
]}

Yao, Zong-Xin ^{[3
]}

Zhao, Dongbin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Shenyang Aircraft Design & Res Inst, Dept Unmanned Aerial Vehicle, Shenyang 110035, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Aircraft; Aerospace control; 6-DOF; Task analysis; Nose; Missiles; Heuristic algorithms; 6-DOF unmanned combat air vehicle (UCAV); air combat; hierarchical structure; reinforcement learning (RL); self-play; LEVEL; GAME;

D O I：

10.1109/TSMC.2023.3270444

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unmanned combat air vehicle (UCAV) combat is a challenging scenario with high-dimensional continuous state and action space and highly nonlinear dynamics. In this article, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under six dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision-making process into two loops and use reinforcement learning (RL) to solve them separately. The outer loop uses a combat policy to decide the macro command according to the current combat situation. Then the inner loop uses a control policy to answer the macro command by calculating the actual input signals for the aircraft. We design the Markov decision-making process for the control policy and the Markov game between two aircraft. We present a two-stage training mechanism. For the control policy, we design an effective reward function to accurately track various macro behaviors. For the combat policy, we present a fictitious self-play mechanism to improve the combat performance by combating against the historical combat policies. Experiment results show that the control policy can achieve better tracking performance than conventional methods. The fictitious self-play mechanism can learn competitive combat policy, which can achieve high winning rates against conventional methods.

引用

页码：5417 / 5429

页数：13

共 50 条

[1] A hierarchical reinforcement learning method on Multi UCAV air combat
Wang, Yabin
Jiang, Tianshu
Li, Youjiang
2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
[2] Maneuver decision of UCAV in air combat based on deep reinforcement learning
Li, Yongfeng
Shi, Jingping
Zhang, Weiguo
Jiang, Wei
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2021, 53 (12): : 33 - 41
[3] Deep Reinforcement Learning-Based Air-to-Air Combat Maneuver Generation in a Realistic Environment
Bae, Jung Ho
Jung, Hoseong
Kim, Seogbong
Kim, Sungho
Kim, Yong-Duk
IEEE ACCESS, 2023, 11 : 26427 - 26440
[4] Deep Reinforcement Learning based Autonomous Air-to-Air Combat using Target Trajectory Prediction
Yoo, Jaewoong
Kim, Donghwi
Shim, David Hyunchul
2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 2172 - 2176
[5] Reinforcement Learning for Multiaircraft Autonomous Air Combat in Multisensor UCAV Platform
Kong, Weiren
Zhou, Deyun
Du, Yongjie
Zhou, Ying
Zhao, Yiyang
IEEE SENSORS JOURNAL, 2023, 23 (18) : 20596 - 20606
[6] Hierarchical Reinforcement Learning Framework in Geographic Coordination for Air Combat Tactical Pursuit
Chen, Ruihai
Li, Hao
Yan, Guanwei
Peng, Haojie
Zhang, Qian
ENTROPY, 2023, 25 (10)
[7] Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning
Li, Zuolong
Zhu, Jihong
Kuang, Minchi
Zhang, Jie
Ren, Jie
Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2024, 45 (17):
[8] Mastering air combat game with deep reinforcement learning
Zhu, Jingyu
Kuang, Minchi
Zhou, Wenqing
Shi, Heng
Zhu, Jihong
Han, Xu
DEFENCE TECHNOLOGY, 2024, 34 : 295 - 312
[9] Mastering air combat game with deep reinforcement learning
Jingyu Zhu
Minchi Kuang
Wenqing Zhou
Heng Shi
Jihong Zhu
Xu Han
Defence Technology, 2024, 34 (04) : 295 - 312
[10] Learning 6-DoF grasping with dual-agent deep reinforcement learning
Hou, Yanxu
Li, Jun
ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 166

← 1 2 3 4 5 →