A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

被引:0
|
作者
Pu, Zijun [1 ]
Zhang, Qunfei [1 ]
Xue, Yangtao [1 ]
Zhu, Peican [2 ]
Cui, Xiaodong [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
关键词
underwater acoustic target recognition; deep learning; multi-feature fusion; wav2vec; 2.0; CQT; Mel-spectrogram;
D O I
10.3390/rs16132442
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time-frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of 96.39%. In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Feature extraction analysis method of pre-trained CNN model for SAR target recognition
    Zheng, Tong
    Feng, Wenbin
    Yu, Chongchong
    Wu, Qing
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (07) : 2294 - 2316
  • [22] Aerial Infrared Target Recognition Algorithm Based on Multi-feature Fusion
    Liu, Qiyan
    Zhang, Kai
    Li, Sijia
    2024 9TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS ENGINEERING, ICCRE 2024, 2024, : 371 - 376
  • [23] Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
    Ernez, Fares
    Arnold, Alexandre
    Galametz, Audrey
    Kobus, Catherine
    Ould-Amer, Nawal
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 16 - 35
  • [24] Underwater Bubble Plume Recognition Algorithm Based on Multi-Feature Fusion Understanding
    Yang, Xue
    Sun, Shiming
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (11)
  • [25] Underwater Acoustic Target Recognition Method Based on Feature Fusion and Residual CNN
    Yang, Yixin
    Yao, Qihai
    Wang, Yong
    IEEE SENSORS JOURNAL, 2024, 24 (22) : 37342 - 37357
  • [26] Research on target recognition method based on multi-feature information fusion decision
    Zhang, Xiaoqian
    Li, Hanshan
    Gao, Junchai
    OPTOELECTRONICS AND ADVANCED MATERIALS-RAPID COMMUNICATIONS, 2018, 12 (11-12): : 634 - 643
  • [27] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Obiang, Saint germes b. bengono
    Tsopze, Norbert
    Yonta, Paulin melatagia
    Bonastre, Jean-francois
    Jimenez, Tania
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [28] A Multi-feature Fusion Moving Target Recognition Method Based On Believability Regression Reasoning
    Tang Xiaogang
    Wang Sun'an
    Di Hongyu
    Liu Litian
    2017 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND IEEE CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2017, : 820 - 825
  • [29] SAR target recognition method of MSTAR data set based on multi-feature fusion
    Shi, Ji
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 626 - 632
  • [30] 3D model recognition and segmentation based on multi-feature fusion
    Dang J.
    Yang J.
    Yang, Jun (yangj@mail.lzjtu.cn), 1600, Science Press (47): : 149 - 157