Trajectory Planning With Deep Reinforcement Learning in High-Level Action Spaces

被引：7

作者：

Williams, Kyle R. ^{[1
]}

Schlossman, Rachel ^{[1
]}

Whitten, Daniel ^{[1
]}

Ingram, Joe

Musuvathy, Srideep ^{[1
]}

Pagan, James ^{[1
]}

Williams, Kyle A. ^{[1
]}

Green, Sam ^{[2
]}

Patel, Anirudh ^{[2
]}

Mazumdar, Anirban ^{[3
]}

Parish, Julie ^{[1
]}

机构：

[1] Sandia Natl Labs, Albuquerque, CA 94551 USA

[2] Semiot Labs, Los Altos, CA 94022 USA

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS | 2023年 / 59卷 / 03期

关键词：

Trajectory; Planning; Trajectory planning; Training; Reinforcement learning; Optimization; Aerodynamics; OPTIMIZATION;

D O I：

10.1109/TAES.2022.3218496

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

This article presents a technique for trajectory planning based on parameterized high-level actions. These high-level actions are subtrajectories that have variable shape and duration. The use of high-level actions can improve the performance of guidance algorithms. Specifically, we show how the use of high-level actions improves the performance of guidance policies that are generated via reinforcement learning (RL). RL has shown great promise for solving complex control, guidance, and coordination problems but can still suffer from long training times and poor performance. This work shows how the use of high-level actions reduces the required number of training steps and increases the path performance of an RL-trained guidance policy. We demonstrate the method on a space-shuttle guidance example. We show the proposed method increases the path performance (latitude range) by 18% compared with a baseline RL implementation. Similarly, we show the proposed method achieves steady state during training with approximately 75% fewer training steps. We also show how the guidance policy enables effective performance in an obstacle field. Finally, this article develops a loss function term for policy-gradient-based deep RL, which is analogous to an antiwindup mechanism in feedback control. We demonstrate that the inclusion of this term in the underlying optimization increases the average policy return in our numerical example.

引用

页码：2513 / 2529

页数：17

共 50 条

[31] Reinforcement learning in continuous action spaces
van Hasselt, Hado
Wiering, Marco A.
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 272 - +
[32] Trajectory Planning for Hypersonic Vehicles with Reinforcement Learning
Chi, Haihong
Thou, Mingxin
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 3721 - 3726
[33] High-Level Behavior Control of an E-Pet with Reinforcement Learning
Hsu, Chih-Wei
Liu, Alan
2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
[34] High-Level Sensor Models for the Reinforcement Learning Driving Policy Training
Turlej, Wojciech
ELECTRONICS, 2023, 12 (01)
[35] Reinforcement Learning for High-Level Strategic Control in Tower Defense Games
Bergdahl, Joakim
Sestini, Alessandro
Gisslen, Linus
2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
[36] Slope Handling for Quadruped Robots Using Deep Reinforcement Learning and Toe Trajectory Planning
Mastrogeorgiou, Athanasios S.
Elbahrawy, Yehia S.
Kecskemethy, Andres
Papadopoulos, Evangelos G.
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 3777 - 3782
[37] Online Trajectory Planning Method for Midcourse Guidance Phase Based on Deep Reinforcement Learning
Li, Wanli
Li, Jiong
Li, Ningbo
Shao, Lei
Li, Mingjie
AEROSPACE, 2023, 10 (05)
[38] Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces
Weisz, Gellert
Budzianowski, Pawel
Su, Pei-Hao
Gasic, Milica
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2083 - 2097
[39] Deep Reinforcement Learning With a Stage Incentive Mechanism of Dense Reward for Robotic Trajectory Planning
Peng, Gang
Yang, Jin
Li, Xinde
Khyam, Mohammad Omar
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (06): : 3566 - 3573
[40] Combining Decision Making and Trajectory Planning for Lane Changing Using Deep Reinforcement Learning
Li, Shurong
Wei, Chong
Wang, Ying
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 16110 - 16136

← 1 2 3 4 5 →