MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification

被引:2
|
作者
Xie, Shanshan [1 ,2 ]
Xie, Jiangjian [1 ,2 ]
Zhang, Junguo [1 ,2 ]
Zhang, Yan [4 ]
Wang, Lifeng [1 ,2 ]
Hu, Huijian [3 ,5 ]
机构
[1] Beijing Forestry Univ, Sch Technol, Beijing 100083, Peoples R China
[2] Key Lab State Forestry & Grassland Adm Forestry Eq, Beijing 100083, Peoples R China
[3] State Key Lab Efficient Prod Forest Resources, Beijing 100083, Peoples R China
[4] Southwest forestry Univ, Coll Math & Phys, Kunming 650224, Peoples R China
[5] Guangdong Acad Sci, Inst Zool, Guangdong Key Lab Anim Conservat & Resource Utiliz, Guangdong Publ Lab Wild Anim Conservat & Utilizat, Guangzhou 510260, Peoples R China
基金
中国国家自然科学基金;
关键词
Bird sound classification; Multi -view fusion; Multi -head self -attention; Cross; -attention;
D O I
10.1016/j.apacoust.2024.110138
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bird sound serves as a crucial means of acoustic communication for birds, and its classification research is conducive to the protection, health, and diversity of the ecological ecosystems. Using various feature extraction methods to extract multi-view features can provide more comprehensive information about bird sound, which is a potential method to improve the accuracy of bird sound classification. However, efficiently fusing multi-view features to identify birds accurately remains a challenging task. To address this problem, this paper presents an efficient bird sound classification framework called MDF-Net. The approach extracts four acoustic features from bird sound audios, including wavelet transform spectrogram, Hilbert-Huang transform spectrogram, short-time Fourier transform spectrogram, and Mel-frequency cepstral coefficients, to fully describe the characteristics of bird sound from different views. Subsequently, convolutional neural network is used as advanced feature extractor to obtain deep features of these spectrograms. Then, the multi-head self-attention mechanism focuses on the correlation and importance of different features in each view to obtain essential and expressive feature representations. And the cross-attention mechanism is employed to align and correlate information in the four views, which makes it easier for the classifier to understand the relationships between features of different views. Finally, combined with the results of the dual-attention mechanism, a multi-view fusion feature with difference and diversity is constructed, and it applied to the bird sound classification. In this study, audios from16 bird species constitute the dataset. The multi-view fusion feature based on MDF-Net achieved a classification accuracy of 97.29%, outperformed the 9 single features and 3 fused features used in the experiments. The result demonstrate that the proposed MDF-Net successfully captures the feature relationships within single-view and between multi-view, providing crucial information for correctly classifying bird sound samples. The approach efficiently fuses the features of different views and improves the performance of bird sound classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Dual Fusion-Propagation Graph Neural Network for Multi-View Clustering
    Xiao, Shunxin
    Du, Shide
    Chen, Zhaoliang
    Zhang, Yunhe
    Wang, Shiping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9203 - 9215
  • [22] MVF-SleepNet: Multi-View Fusion Network for Sleep Stage Classification
    Li, Yujie
    Chen, Jingrui
    Ma, Wenjun
    Zhao, Gansen
    Fan, Xiaomao
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 2485 - 2495
  • [23] A Deep Learning Approach for Crop Disease and Pest Classification Using Swin Transformer and Dual-Attention Multi-Scale Fusion Network
    Karthik, R.
    Ajay, Armaano
    Singh Bisht, Akshaj
    Illakiya, T.
    Suganthi, K.
    IEEE ACCESS, 2024, 12 : 152639 - 152655
  • [24] DR-Net: A Multi-View Face Synthesis Network Driven by Dual Representation
    Huang, Xianliang
    Lang, Yining
    Guo, Ying
    He, Yuan
    Xue, Hui
    Zhao, Li
    Zhou, Shuigeng
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1751 - 1756
  • [25] SADCL-Net: Sparse-driven Attention with Dual-Consistency Learning Network for Incomplete Multi-view Clustering
    Xue, Sicheng
    Zhu, Changming
    MULTIMEDIA SYSTEMS, 2024, 30 (05)
  • [26] Hyperspectral Image Classification Based on Double-Branch Multi-Scale Dual-Attention Network
    Zhang, Heng
    Liu, Hanhu
    Yang, Ronghao
    Wang, Wei
    Luo, Qingqu
    Tu, Changda
    REMOTE SENSING, 2024, 16 (12)
  • [27] DAR-MVSNet: a novel dual attention residual network for multi-view stereo
    Li, Tingshuai
    Liang, Hu
    Wen, Changchun
    Qu, Jiacheng
    Zhao, Shengrong
    Zhang, Qingmeng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5857 - 5866
  • [28] Joint long and short span self-attention network for multi-view classification
    Chen, Zhikui
    Lou, Kai
    Liu, Zhenjiao
    Li, Yue
    Luo, Yiming
    Zhao, Liang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [29] An improved multi-view attention network inspired by coupled P system for node classification
    Liu, Qian
    Liu, Xiyu
    PLOS ONE, 2022, 17 (04):
  • [30] A multi-slice attention fusion and multi-view personalized fusion lightweight network for Alzheimer's disease diagnosis
    Zhang, Qiongmin
    Long, Ying
    Cai, Hongshun
    Yu, Siyi
    Shi, Yin
    Tan, Xiaowei
    BMC MEDICAL IMAGING, 2024, 24 (01):