A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

被引:0
|
作者
Pu, Zijun [1 ]
Zhang, Qunfei [1 ]
Xue, Yangtao [1 ]
Zhu, Peican [2 ]
Cui, Xiaodong [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
关键词
underwater acoustic target recognition; deep learning; multi-feature fusion; wav2vec; 2.0; CQT; Mel-spectrogram;
D O I
10.3390/rs16132442
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time-frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of 96.39%. In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
    Lee, Jae-Hong
    Lee, Chae-Won
    Choi, Jin-Seong
    Chang, Joon-Hyuk
    Seong, Woo Kyeong
    Lee, Jeonghan
    INTERSPEECH 2022, 2022, : 3398 - 3402
  • [2] Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition
    Zhao, Zihan
    Wang, Yanfeng
    Wang, Yu
    INTERSPEECH 2022, 2022, : 4725 - 4729
  • [3] On the robustness of wav2vec 2.0 based speaker recognition systems
    Novoselov, Sergey
    Lavrentyeva, Galina
    Avdeeva, Anastasia
    Volokhov, Vladimir
    Khmelev, Nikita
    Akulov, Artem
    Leonteva, Polina
    INTERSPEECH 2023, 2023, : 3177 - 3181
  • [4] Intelligent Recognition of Underwater Acoustic Target Noise by Multi-feature Fusion
    Zhang, Shaokang
    Xing, Shihong
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2018, : 212 - 215
  • [5] Speech recognition model design for Sundanese language using WAV2VEC 2.0
    Cryssiover A.
    Zahra A.
    International Journal of Speech Technology, 2024, 27 (01) : 171 - 177
  • [6] Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion
    Nasersharif, Babak
    Namvarpour, Mohammad
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23667 - 23688
  • [7] UAPT: an underwater acoustic target recognition method based on pre-trained Transformer
    Tang, Jun
    Ma, Enxue
    Qu, Yang
    Gao, Wenbo
    Zhang, Yuchen
    Gan, Lin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [8] Underwater target recognition based on adaptive multi-feature fusion network
    Pan X.
    Sun J.
    Feng T.
    Lei M.
    Wang H.
    Zhang W.
    Multimedia Tools and Applications, 2025, 84 (10) : 7297 - 7317
  • [9] Speech Emotion Recognition Based on Shallow Structure of Wav2vec 2.0 and Attention Mechanism
    Zhang, Yumei
    Jia, Maoshen
    Cao, Xuan
    Zhao, Zichen
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 398 - 402
  • [10] MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
    Sharma, Mayank
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6907 - 6911