Deep deterministic policy gradient algorithm for UAV control

被引:0
|
作者
Huang X. [1 ,2 ]
Liu J. [1 ,2 ]
Jia C. [1 ,2 ]
Wang Z. [1 ,2 ]
Zhang J. [1 ,2 ]
机构
[1] Beijing Aerospace Automatic Control Institute, Beijing
[2] National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing
基金
中国国家自然科学基金;
关键词
Deep deterministic policy gradient; End to end; Flight control; Small UAV; Sparse reward;
D O I
10.7527/S1000-6893.2020.24688
中图分类号
学科分类号
摘要
The deep deterministic policy gradient algorithm is used to train the agent to learn the flight control strategy of a small UAV. The velocity, position and attitude angle of multi data frames are taken as the observation state of the agent, the rudder deflection angle and engine thrust command the output actions of the agent, and the nonlinear model and flight environment of the UAV the learning environment of the agent. In the interaction process between the agent and the environment, sparse rewards are provided to achieve certain goals, in addition to the dense punishment including error information, thereby effectively improving the diversity of flight data samples and enhancing the learning efficiency of the agent. The agent finally realizes the end-to-end flight control from the position, velocity and attitude angle to the control variables. In addition, the flight control simulations under the conditions of variable track point, model parameter deviation, disturbance and fault are carried out. Simulation results show that the agent can not only effectively complete the training task, but also deal with a variety of flight tasks not learned during training, showing excellent generalization ability and exhibiting certain research value and engineering reference value of the method. © 2021, Beihang University Aerospace Knowledge Press. All right reserved.
引用
收藏
相关论文
共 24 条
  • [1] FU W X, GUO H, YAN J, Et al., Overview on the technology development trend of intelligent unmanned aerial vehicle, Unmanned Systems Technology, 2, 4, pp. 31-37, (2019)
  • [2] FLANAGAN J, STRUTZENBERG R, MYERS R, Et al., Development and flight testing of a morphing aircraft, the NextGen MFX-1, 48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, pp. 23-26, (2007)
  • [3] LEI X S, TAO Y., Adaptive control for small unmanned aerial vehicle under wind disturbance, Acta Aeronautica et Astronautica Sinica, 31, 6, pp. 1171-1176, (2010)
  • [4] XU R, OZGUNER U., Sliding mode control of a quadrotor helicopter, Proceedings of the 45th IEEE Conference on Decision and Control, pp. 4957-4962, (2006)
  • [5] LEWIS F L., LIU D Y, LIU H, Robust fault-tolerant formation control for tail-sitters, Acta Aeronautica et Astronautica Sinica, 42, 2, (2021)
  • [6] DANG X W, TANG P, SUN H Q, Et al., Incremental nonlinear dynamic inversion control and flight test based on angular acceleration estimation, Acta Aeronautica et Astronautica Sinica, 41, 4, (2020)
  • [7] CHEN S Z, CHU L F, YANG X M, Et al., Application of state prediction neural network control algorithm in small reusable rocket, Acta Aeronautica et Astronautica Sinica, 40, 3, (2019)
  • [8] LIU J K., Intelligent control, pp. 178-179, (2017)
  • [9] NG A Y, COATES A, DIEL M, Et al., Autonomous inverted helicopter flight via reinforcement learning, Experimental Robotics IX, pp. 363-372, (2006)
  • [10] ABBEEL P, COATES A, QUIGLEY M, Et al., An application of reinforcement learning to aerobatic helicopter flight, Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, pp. 1-8, (2007)