Model-Free Guidance Method for Drones in Complex Environments Using Direct Policy Exploration and Optimization

被引:3
|
作者
Liu, Hongxun [1 ]
Suzuki, Satoshi [1 ]
机构
[1] Chiba Univ, Sch Sci & Engn, 1-33 Yayoi Cho,Inage Ku, Chiba 2638522, Japan
关键词
drones; reinforcement learning; policy optimization; model-free; traverse complex environments;
D O I
10.3390/drones7080514
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
In the past few decades, drones have become lighter, with longer hang times, and exhibit more agile performance. To maximize their capabilities during flights in complex environments, researchers have proposed various model-based perception, planning, and control methods aimed at decomposing the problem into modules and collaboratively accomplishing the task in a sequential manner. However, in practical environments, it is extremely difficult to model both the drones and their environments, with very few existing model-based methods. In this study, we propose a novel model-free reinforcement-learning-based method that can learn the optimal planning and control policy from experienced flight data. During the training phase, the policy considers the complete state of the drones and environmental information as inputs. It then self-optimizes based on a predefined reward function. In practical implementations, the policy takes inputs from onboard and external sensors and outputs optimal control commands to low-level velocity controllers in an end-to-end manner. By capitalizing on this property, the planning and control policy can be improved without the need for an accurate system model and can drive drones to traverse complex environments at high speeds. The policy was trained and tested in a simulator, as well as in real-world flight experiments, demonstrating its practical applicability. The results show that this model-free method can learn to fly effectively and that it holds great potential to handle different tasks and environments.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Model-Free Imitation Learning with Policy Optimization
    Ho, Jonathan
    Gupta, Jayesh K.
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Model-free Optimization: The Exploration-Exploitation Paradigm
    Raphel, Mariya
    Gunjal, Revati
    Wagh, S. R.
    Singh, N. M.
    2022 EIGHTH INDIAN CONTROL CONFERENCE, ICC, 2022, : 422 - 427
  • [3] Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective
    Li, Yansong
    Han, Shuo
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [4] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 12
  • [5] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5574 - 5585
  • [6] Model-Free Trajectory-based Policy Optimization with Monotonic Improvement
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Peters, Jan
    Neumann, Gerhard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [7] Locally Weighted Least Squares Policy Iteration for Model-free Learning in Uncertain Environments
    Howard, Matthew
    Nakamura, Yoshihiko
    2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 1223 - 1229
  • [8] Model-Free Optimal Control using SPSA with Complex Variables
    Wang, Long
    Zhu, Jingyi
    Spall, James C.
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [9] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [10] Computing Stabilizing Feedback Gains via a Model-Free Policy Gradient Method
    Ozaslan, Ibrahim K.
    Mohammadi, Hesameddin
    Jovanovic, Mihailo R.
    IEEE CONTROL SYSTEMS LETTERS, 2022, 7 : 407 - 412