Tracking control of AUV via novel soft actor-critic and suboptimal demonstrations

被引：4

作者：

Zhang, Yue ^{[1
]}

Zhang, Tianze ^{[1
]}

Li, Yibin ^{[1
]}

Zhuang, Yinghao ^{[1
]}

机构：

[1] Shandong Univ, Inst Marine Sci & Technol, Qingdao 266237, Shandong, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 293卷

基金：

中国国家自然科学基金;

关键词：

Autonomous underwater vehicle (AUV); Tracking control; Reinforcement learning (RL); Suboptimal demonstration; Soft actor-critic (SAC); Recurrent neural network (RNN);

D O I：

10.1016/j.oceaneng.2023.116540

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Tracking control for autonomous underwater vehicles (AUVs) faces multifaceted challenges, making the acquisition of optimal demonstrations a daunting task. The suboptimal demonstrations mean less tracking accuracy. To address the issue of learning from suboptimal demonstrations, this paper proposes a model-free reinforcement learning (RL) method. Our approach utilizes suboptimal demonstrations to obtain an initial controller, which is iteratively refined during training. Given the suboptimal characteristics, demonstrations will be removed from the replay buffer upon reaching capacity. Building upon the soft actor-critic (SAC), our approach integrates a Recurrent Neural Network (RNN) into the policy network to capture the relationship between states and actions. Moreover, we introduce logarithmic and cosine functions to the reward function for enhancing the training effectiveness. Finally, we validate the effectiveness of the proposed Initialize Controller from Demonstrations (ICfD) algorithm through simulations with two reference trajectories. We provide a definition for tracking success. The success rates of ICfD in the two reference trajectories are 95.60% and 94.05%, respectively, surpassing the state-of-the-art RL method SACfD (80.03% 90.55%). The average one-step distance errors of ICfD are 1.20 m and 0.76 m, respectively, significantly lower than the S-plane controller (9.725 m 8.325 m). Besides, we evaluate the generalization of the ICfD controller in different scenarios.

引用

页数：13

共 50 条

[1] Simultaneous Control and Guidance of an AUV Based on Soft Actor-Critic
Sola, Yoann
Le Chenadec, Gilles
Clement, Benoit
SENSORS, 2022, 22 (16)
[2] A Novel Approach for Train Tracking in Virtual Coupling Based on Soft Actor-Critic
Chen, Bin
Zhang, Lei
Cheng, Gaoyun
Liu, Yiqing
Chen, Junjie
ACTUATORS, 2023, 12 (12)
[3] Speed Tracking Control via Online Continuous Actor-Critic learning
Huang, Zhenhua
Xu, Xin
Sun, Zhenping
Tan, Jun
Qian, Lilin
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 3172 - 3177
[4] Actor-Critic Reinforcement Learning for Tracking Control in Robotics
Pane, Yudha P.
Nageshrao, Subramanya P.
Babuska, Robert
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 5819 - 5826
[5] Characterizing Motor Control of Mastication With Soft Actor-Critic
Abdi, Amir H.
Sagl, Benedikt
Srungarapu, Venkata P.
Stavness, Ian
Prisman, Eitan
Abolmaesumi, Purang
Fels, Sidney
FRONTIERS IN HUMAN NEUROSCIENCE, 2020, 14
[6] Path Planning and Tracking Control for Parking via Soft Actor-Critic Under Non-Ideal Scenarios
Tang, Xiaolin
Yang, Yuyou
Liu, Teng
Lin, Xianke
Yang, Kai
Li, Shen
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (01) : 181 - 195
[7] Accelerating Fuzzy Actor-Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
Wang, Xiao
Ma, Zhe
Mao, Lei
Sun, Kewu
Huang, Xuhui
Fan, Changchao
Li, Jiake
ELECTRONICS, 2023, 12 (08)
[8] Path Planning and Tracking Control for Parking via Soft Actor-Critic Under Non-Ideal Scenarios
Xiaolin Tang
Yuyou Yang
Teng Liu
Xianke Lin
Kai Yang
Shen Li
IEEE/CAA Journal of Automatica Sinica, 2024, 11 (01) : 181 - 195
[9] End-to-End AUV Motion Planning Method Based on Soft Actor-Critic
Yu, Xin
Sun, Yushan
Wang, Xiangbin
Zhang, Guocheng
SENSORS, 2021, 21 (17)
[10] Generative Adversarial Soft Actor-Critic
Hwang, Hyo-Seok
Kim, Yoojoong
Seok, Junhee
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,

← 1 2 3 4 5 →