MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification

被引:2
|
作者
Xie, Shanshan [1 ,2 ]
Xie, Jiangjian [1 ,2 ]
Zhang, Junguo [1 ,2 ]
Zhang, Yan [4 ]
Wang, Lifeng [1 ,2 ]
Hu, Huijian [3 ,5 ]
机构
[1] Beijing Forestry Univ, Sch Technol, Beijing 100083, Peoples R China
[2] Key Lab State Forestry & Grassland Adm Forestry Eq, Beijing 100083, Peoples R China
[3] State Key Lab Efficient Prod Forest Resources, Beijing 100083, Peoples R China
[4] Southwest forestry Univ, Coll Math & Phys, Kunming 650224, Peoples R China
[5] Guangdong Acad Sci, Inst Zool, Guangdong Key Lab Anim Conservat & Resource Utiliz, Guangdong Publ Lab Wild Anim Conservat & Utilizat, Guangzhou 510260, Peoples R China
基金
中国国家自然科学基金;
关键词
Bird sound classification; Multi -view fusion; Multi -head self -attention; Cross; -attention;
D O I
10.1016/j.apacoust.2024.110138
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bird sound serves as a crucial means of acoustic communication for birds, and its classification research is conducive to the protection, health, and diversity of the ecological ecosystems. Using various feature extraction methods to extract multi-view features can provide more comprehensive information about bird sound, which is a potential method to improve the accuracy of bird sound classification. However, efficiently fusing multi-view features to identify birds accurately remains a challenging task. To address this problem, this paper presents an efficient bird sound classification framework called MDF-Net. The approach extracts four acoustic features from bird sound audios, including wavelet transform spectrogram, Hilbert-Huang transform spectrogram, short-time Fourier transform spectrogram, and Mel-frequency cepstral coefficients, to fully describe the characteristics of bird sound from different views. Subsequently, convolutional neural network is used as advanced feature extractor to obtain deep features of these spectrograms. Then, the multi-head self-attention mechanism focuses on the correlation and importance of different features in each view to obtain essential and expressive feature representations. And the cross-attention mechanism is employed to align and correlate information in the four views, which makes it easier for the classifier to understand the relationships between features of different views. Finally, combined with the results of the dual-attention mechanism, a multi-view fusion feature with difference and diversity is constructed, and it applied to the bird sound classification. In this study, audios from16 bird species constitute the dataset. The multi-view fusion feature based on MDF-Net achieved a classification accuracy of 97.29%, outperformed the 9 single features and 3 fused features used in the experiments. The result demonstrate that the proposed MDF-Net successfully captures the feature relationships within single-view and between multi-view, providing crucial information for correctly classifying bird sound samples. The approach efficiently fuses the features of different views and improves the performance of bird sound classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Dual-attention U-Net and multi-convolution network for single-image rain removal
    Zheng, Ziyang
    Chen, Zhixiang
    Wang, Shuqi
    Wang, Wenpeng
    VISUAL COMPUTER, 2024, 40 (11): : 7637 - 7649
  • [32] DRAF-Net: Dual-Branch Residual-Guided Multi-View Attention Fusion Network for Station-Level Numerical Weather Prediction Correction
    Chen, Kaixin
    Chen, Jiaxin
    Xu, Mengqiu
    Wu, Ming
    Zhang, Chuang
    REMOTE SENSING, 2025, 17 (02)
  • [33] Multi-View Hierarchical Fusion Network for 3D Object Retrieval and Classification
    Liu, An-An
    Hu, Nian
    Song, Dan
    Guo, Fu-Bin
    Zhou, He-Yu
    Hao, Tong
    IEEE ACCESS, 2019, 7 : 153021 - 153030
  • [34] Few-shot multi-view object classification via dual augmentation network
    Zhou, Yaqian
    Lu, Haochun
    Hao, Tong
    Li, Xuanya
    Liu, An-An
    INFORMATION FUSION, 2023, 100
  • [35] Multi-view rotating machinery fault diagnosis with adaptive co-attention fusion network
    Liu, Xiaorong
    Wang, Jie
    Meng, Sa
    Qiu, Xiwei
    Zhao, Guilin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [36] DAMS-Net: Dual attention and multi-scale information fusion network for 12-lead ECG classification
    Zhou, Rongzhou
    Yao, Junfeng
    Hong, Qingqi
    Zheng, Yuan
    Zheng, Liling
    METHODS, 2023, 220 : 134 - 141
  • [37] Improved Multi-Head Self-Attention Classification Network for Multi-View Fetal Echocardiography Recognition
    Zhang, Yingying
    Zhu, Haogang
    Wang, Yan
    Wang, Jingyi
    He, Yihua
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [38] A Multi-Scale Cross-Fusion Medical Image Segmentation Network Based on Dual-Attention Mechanism Transformer
    Cui, Jianguo
    Wang, Liejun
    Jiang, Shaochen
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [39] Multi-View Information Fusion Fault Diagnosis Method Based on Attention Mechanism and Convolutional Neural Network
    Li, Hongmei
    Huang, Jinying
    Gao, Minjuan
    Yang, Luxia
    Bao, Yichen
    APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [40] MFASleepNet: Multi-view fusion attention-based deep neural network for automatic sleep staging
    Hou, Zhoujie
    Pan, Jiahui
    Li, Yuanqing
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,