MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification

被引:2
|
作者
Xie, Shanshan [1 ,2 ]
Xie, Jiangjian [1 ,2 ]
Zhang, Junguo [1 ,2 ]
Zhang, Yan [4 ]
Wang, Lifeng [1 ,2 ]
Hu, Huijian [3 ,5 ]
机构
[1] Beijing Forestry Univ, Sch Technol, Beijing 100083, Peoples R China
[2] Key Lab State Forestry & Grassland Adm Forestry Eq, Beijing 100083, Peoples R China
[3] State Key Lab Efficient Prod Forest Resources, Beijing 100083, Peoples R China
[4] Southwest forestry Univ, Coll Math & Phys, Kunming 650224, Peoples R China
[5] Guangdong Acad Sci, Inst Zool, Guangdong Key Lab Anim Conservat & Resource Utiliz, Guangdong Publ Lab Wild Anim Conservat & Utilizat, Guangzhou 510260, Peoples R China
基金
中国国家自然科学基金;
关键词
Bird sound classification; Multi -view fusion; Multi -head self -attention; Cross; -attention;
D O I
10.1016/j.apacoust.2024.110138
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bird sound serves as a crucial means of acoustic communication for birds, and its classification research is conducive to the protection, health, and diversity of the ecological ecosystems. Using various feature extraction methods to extract multi-view features can provide more comprehensive information about bird sound, which is a potential method to improve the accuracy of bird sound classification. However, efficiently fusing multi-view features to identify birds accurately remains a challenging task. To address this problem, this paper presents an efficient bird sound classification framework called MDF-Net. The approach extracts four acoustic features from bird sound audios, including wavelet transform spectrogram, Hilbert-Huang transform spectrogram, short-time Fourier transform spectrogram, and Mel-frequency cepstral coefficients, to fully describe the characteristics of bird sound from different views. Subsequently, convolutional neural network is used as advanced feature extractor to obtain deep features of these spectrograms. Then, the multi-head self-attention mechanism focuses on the correlation and importance of different features in each view to obtain essential and expressive feature representations. And the cross-attention mechanism is employed to align and correlate information in the four views, which makes it easier for the classifier to understand the relationships between features of different views. Finally, combined with the results of the dual-attention mechanism, a multi-view fusion feature with difference and diversity is constructed, and it applied to the bird sound classification. In this study, audios from16 bird species constitute the dataset. The multi-view fusion feature based on MDF-Net achieved a classification accuracy of 97.29%, outperformed the 9 single features and 3 fused features used in the experiments. The result demonstrate that the proposed MDF-Net successfully captures the feature relationships within single-view and between multi-view, providing crucial information for correctly classifying bird sound samples. The approach efficiently fuses the features of different views and improves the performance of bird sound classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] MMDN: Arrhythmia detection using multi-scale multi-view dual-branch fusion network
    Zhu, Yelong
    Jiang, Mingfeng
    He, Xiaoyu
    Li, Yang
    Li, Juan
    Mao, Jiangdong
    Ke, Wei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 96
  • [42] Multi-view Stereo Vision Reconstruction Network with Fusion Attention Mechanism and Multi-layer Dynamic Deformable Convolution
    Sun, Kai
    Zhang, Cheng
    Zhan, Tian
    Su, Di
    Binggong Xuebao/Acta Armamentarii, 2024, 45 (10): : 3631 - 3641
  • [43] Multi-view attention-convolution pooling network for 3D point cloud classification
    Wenju Wang
    Tao Wang
    Yu Cai
    Applied Intelligence, 2022, 52 : 14787 - 14798
  • [44] Multi-view attention-convolution pooling network for 3D point cloud classification
    Wang, Wenju
    Wang, Tao
    Cai, Yu
    APPLIED INTELLIGENCE, 2022, 52 (13) : 14787 - 14798
  • [45] Multi-view stereoscopic attention network for 3D tumor classification in automated breast ultrasound
    Ding, Wanli
    Zhang, Heye
    Zhuang, Shuxin
    Zhuang, Zhemin
    Gao, Zhifan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [46] EAFF-Net: Efficient attention feature fusion network for dual-modality pedestrian detection
    Shen, Ying
    Xie, Xiaoyang
    Wu, Jing
    Chen, Liqiong
    Huang, Feng
    INFRARED PHYSICS & TECHNOLOGY, 2025, 145
  • [47] Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding
    Tan, Xiyue
    Wang, Dan
    Xu, Meng
    Chen, Jiaming
    Wu, Shuhan
    BIOENGINEERING-BASEL, 2024, 11 (09):
  • [48] MASG-GAN: A multi-view attention superpixel-guided generative adversarial network for efficient and simultaneous histopathology image segmentation and classification
    Zhang, Huaqi
    Liu, Jie
    Yu, Zekuan
    Wang, Pengyu
    NEUROCOMPUTING, 2021, 463 : 275 - 291
  • [49] Joint Network Combining Dual-Attention Fusion Modality and Two Specific Modalities for Land Cover Classification Using Optical and SAR Images
    Liu, Xiao
    Zou, Huijun
    Wang, Shuxiang
    Lin, Yuzhun
    Zuo, Xibing
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 3236 - 3250
  • [50] MIGA-Net: Multi-View Image Information Learning Based on Graph Attention Network for SAR Target Recognition
    Wang, Ruiqiu
    Su, Tao
    Xu, Dan
    Chen, Jianlai
    Liang, Yuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10779 - 10792