A Novel Multi-Feature Fusion Model Based on Pre-Trained Wav2vec 2.0 for Underwater Acoustic Target Recognition

被引:0
|
作者
Pu, Zijun [1 ]
Zhang, Qunfei [1 ]
Xue, Yangtao [1 ]
Zhu, Peican [2 ]
Cui, Xiaodong [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
关键词
underwater acoustic target recognition; deep learning; multi-feature fusion; wav2vec; 2.0; CQT; Mel-spectrogram;
D O I
10.3390/rs16132442
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Although recent data-driven Underwater Acoustic Target Recognition (UATR) methods have played a dominant role in marine acoustics, they suffer from complex ocean environments and rather small datasets. To tackle such challenges, researchers have resorted to transfer learning in an effort to fulfill UATR tasks. However, existing pre-trained models are trained on audio speech data, and are not suitable for underwater acoustic data. Therefore, it is necessary to make further optimization on the basis of these models to make them suitable for the UATR task. Here, we propose a novel UATR framework called Attention Layer Supplement Integration (ALSI), which integrates large pre-trained neural networks with customized attention modules for acoustic. Specifically, the ALSI model consists of two important modules, namely Scale ResNet and Residual Hybrid Attention Fusion (RHAF). First, the Scale ResNet module takes the Constant-Q transform feature as input to obtain relatively important frequency information. Next, RHAF takes the temporal feature extracted by wav2vec 2.0 and the frequency feature extracted by Scale ResNet as input and aims to better integrate the time-frequency features with the temporal feature by using the attention mechanism. The RHAF module can help wav2vec 2.0, which is trained on speech data, to better adapt to underwater acoustic data. Finally, the experiments on the ShipsEar dataset demonstrated that our model can achieve recognition accuracy of 96.39%. In conclusion, extensive experiments confirm the effectiveness of our model on the UATR task.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Named Entity Recognition Model of Power Equipment Based on Multi-feature Fusion
    Wu, Yun
    Ma, Xiangwen
    Yang, Jieming
    Wang, Anping
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 255 - 267
  • [32] MTLSER: Multi-task learning enhanced speech emotion recognition with pre-trained acoustic model
    Chen, Zengzhao
    Liu, Chuan
    Wang, Zhifeng
    Zhao, Chuanxu
    Lin, Mengting
    Zheng, Qiuyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 273
  • [33] Underground multi-target recognition of ground penetrating radar based on multi-feature information fusion
    Zou, Hailin
    Liu, Chanjuan
    Zhou, Shusen
    Zang, Mujun
    Metallurgical and Mining Industry, 2015, 7 (07): : 274 - 282
  • [34] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    Song Wei
    Liu Ningning
    Yang Guosheng
    Yang Pei
    CHINA COMMUNICATIONS, 2015, 12 (02) : 93 - 102
  • [35] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    SONG Wei
    LIU Ningning
    YANG Guosheng
    YANG Pei
    China Communications, 2015, (S2) : 93 - 102
  • [36] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    SONG Wei
    LIU Ningning
    YANG Guosheng
    YANG Pei
    中国通信, 2015, 12(S2) (S2) : 93 - 102
  • [37] Target Type Recognition Algorithm for SAR Image Based on Multi-feature Fusion Classifier of KPFD
    Kong, Yingying
    Chen, Weiyang
    Leung, Henry
    PROCEEDINGS OF 2015 IEEE 5TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION, 2015, : 435 - 439
  • [38] A thermal infrared target tracking based on multi-feature fusion and adaptive model update
    Wang, Yong
    Huo, Lile
    Fan, Yunsheng
    Wang, Guofeng
    INFRARED PHYSICS & TECHNOLOGY, 2024, 139
  • [39] Underwater Acoustic Target Recognition Based on Multi-timeslice Demodulation Line Spectrum Feature
    Shi, Guangzhi
    Hu, Junchuan
    Han, Mei
    Li, Yuyang
    2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 835 - 839
  • [40] Underwater Target Noise Recognition and Classification Technology based on Multi-Classes Feature Fusion
    Zhang S.
    Wang C.
    Sun Q.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2020, 38 (02): : 366 - 376