Tracking control of AUV via novel soft actor-critic and suboptimal demonstrations

被引：4

作者：

Zhang, Yue ^{[1
]}

Zhang, Tianze ^{[1
]}

Li, Yibin ^{[1
]}

Zhuang, Yinghao ^{[1
]}

机构：

[1] Shandong Univ, Inst Marine Sci & Technol, Qingdao 266237, Shandong, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 293卷

基金：

中国国家自然科学基金;

关键词：

Autonomous underwater vehicle (AUV); Tracking control; Reinforcement learning (RL); Suboptimal demonstration; Soft actor-critic (SAC); Recurrent neural network (RNN);

D O I：

10.1016/j.oceaneng.2023.116540

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Tracking control for autonomous underwater vehicles (AUVs) faces multifaceted challenges, making the acquisition of optimal demonstrations a daunting task. The suboptimal demonstrations mean less tracking accuracy. To address the issue of learning from suboptimal demonstrations, this paper proposes a model-free reinforcement learning (RL) method. Our approach utilizes suboptimal demonstrations to obtain an initial controller, which is iteratively refined during training. Given the suboptimal characteristics, demonstrations will be removed from the replay buffer upon reaching capacity. Building upon the soft actor-critic (SAC), our approach integrates a Recurrent Neural Network (RNN) into the policy network to capture the relationship between states and actions. Moreover, we introduce logarithmic and cosine functions to the reward function for enhancing the training effectiveness. Finally, we validate the effectiveness of the proposed Initialize Controller from Demonstrations (ICfD) algorithm through simulations with two reference trajectories. We provide a definition for tracking success. The success rates of ICfD in the two reference trajectories are 95.60% and 94.05%, respectively, surpassing the state-of-the-art RL method SACfD (80.03% 90.55%). The average one-step distance errors of ICfD are 1.20 m and 0.76 m, respectively, significantly lower than the S-plane controller (9.725 m 8.325 m). Besides, we evaluate the generalization of the ICfD controller in different scenarios.

引用

页数：13

共 50 条

[41] Event-triggered receding horizon control via actor-critic design
Lu Dong
Xin Yuan
Changyin Sun
Science China Information Sciences, 2020, 63
[42] Research on Control Method of Electric Vehicle in Residential Area Based on Soft Actor-Critic
Yu, Hang
Dou, Xiaobo
Hu, Wei
Zhang, Kexin
2023 5TH ASIA ENERGY AND ELECTRICAL ENGINEERING SYMPOSIUM, AEEES, 2023, : 1235 - 1240
[43] Soft-Robust Actor-Critic Policy-Gradient
Derman, Esther
Mankowitz, Daniel J.
Mann, Timothy A.
Mannor, Shie
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 208 - 218
[44] Generalizing Soft Actor-Critic Algorithms to Discrete Action Spaces
Zhang, Le
Gu, Yong
Zhao, Xin
Zhang, Yanshuo
Zhao, Shu
Jin, Yifei
Wu, Xinxin
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT 1, 2025, 15031 : 34 - 49
[45] Meta Soft Actor-Critic Based Robust Sequential Power Control in Vehicular Networks
Liu, Zhihua
Guo, Chongtao
Guo, Cheng
Liu, Zhaoyang
Wang, Xijun
2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[46] A Predictive Control Method Based on Neural Predictor and Soft Actor-Critic for Power Converters
Liu, Chenghao
Ma, Jien
Liu, Xing
Qiu, Lin
Wu, Wenjie
Fang, Youtong
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024,
[47] Event-triggered receding horizon control via actor-critic design
Lu DONG
Xin YUAN
Changyin SUN
Science China(Information Sciences), 2020, 63 (05) : 131 - 145
[48] Soft Actor-Critic Request Redirection for Quality Control in Green Multimedia Content Distribution
Goudarzi, Pejman
Lloret, Jaime
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (12):
[49] Event-triggered receding horizon control via actor-critic design
Dong, Lu
Yuan, Xin
Sun, Changyin
SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (05)
[50] Energy-efficient train control method based on soft actor-critic algorithm
Zhu, Q.
Su, S.
Tang, T.
Xiao, X.
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2423 - 2428

← 1 2 3 4 5 →