MDF-Net: A multi-view dual-attention fusion network for efficient bird sound classification

被引:2
|
作者
Xie, Shanshan [1 ,2 ]
Xie, Jiangjian [1 ,2 ]
Zhang, Junguo [1 ,2 ]
Zhang, Yan [4 ]
Wang, Lifeng [1 ,2 ]
Hu, Huijian [3 ,5 ]
机构
[1] Beijing Forestry Univ, Sch Technol, Beijing 100083, Peoples R China
[2] Key Lab State Forestry & Grassland Adm Forestry Eq, Beijing 100083, Peoples R China
[3] State Key Lab Efficient Prod Forest Resources, Beijing 100083, Peoples R China
[4] Southwest forestry Univ, Coll Math & Phys, Kunming 650224, Peoples R China
[5] Guangdong Acad Sci, Inst Zool, Guangdong Key Lab Anim Conservat & Resource Utiliz, Guangdong Publ Lab Wild Anim Conservat & Utilizat, Guangzhou 510260, Peoples R China
基金
中国国家自然科学基金;
关键词
Bird sound classification; Multi -view fusion; Multi -head self -attention; Cross; -attention;
D O I
10.1016/j.apacoust.2024.110138
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Bird sound serves as a crucial means of acoustic communication for birds, and its classification research is conducive to the protection, health, and diversity of the ecological ecosystems. Using various feature extraction methods to extract multi-view features can provide more comprehensive information about bird sound, which is a potential method to improve the accuracy of bird sound classification. However, efficiently fusing multi-view features to identify birds accurately remains a challenging task. To address this problem, this paper presents an efficient bird sound classification framework called MDF-Net. The approach extracts four acoustic features from bird sound audios, including wavelet transform spectrogram, Hilbert-Huang transform spectrogram, short-time Fourier transform spectrogram, and Mel-frequency cepstral coefficients, to fully describe the characteristics of bird sound from different views. Subsequently, convolutional neural network is used as advanced feature extractor to obtain deep features of these spectrograms. Then, the multi-head self-attention mechanism focuses on the correlation and importance of different features in each view to obtain essential and expressive feature representations. And the cross-attention mechanism is employed to align and correlate information in the four views, which makes it easier for the classifier to understand the relationships between features of different views. Finally, combined with the results of the dual-attention mechanism, a multi-view fusion feature with difference and diversity is constructed, and it applied to the bird sound classification. In this study, audios from16 bird species constitute the dataset. The multi-view fusion feature based on MDF-Net achieved a classification accuracy of 97.29%, outperformed the 9 single features and 3 fused features used in the experiments. The result demonstrate that the proposed MDF-Net successfully captures the feature relationships within single-view and between multi-view, providing crucial information for correctly classifying bird sound samples. The approach efficiently fuses the features of different views and improves the performance of bird sound classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-view Instance Attention Fusion Network for classification
    Li, Jinxing
    Zhou, Chuhao
    Ji, Xiaoqiang
    Li, Mu
    Lu, Guangming
    Xu, Yong
    Zhang, David
    INFORMATION FUSION, 2024, 101
  • [2] GAF-Net: Graph attention fusion network for multi-view semi-supervised classification
    Song, Na
    Du, Shide
    Wu, Zhihao
    Zhong, Luying
    Yang, Laurence T.
    Yang, Jing
    Wang, Shiping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [3] Dual-attention EfficientNet based on multi-view feature fusion for cervical squamous intraepithelial lesions diagnosis
    Guo, Ying
    Wang, Yongxiong
    Yang, Huimin
    Zhang, Jiapeng
    Sun, Qing
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2022, 42 (02) : 529 - 542
  • [4] MDF-Net: A Multi-Scale Dynamic Fusion Network for Breast Tumor Segmentation of Ultrasound Images
    Qi, Wenbo
    Wu, H. C.
    Chan, S. C.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4842 - 4855
  • [5] Sequential attention layer-wise fusion network for multi-view classification
    Teng, Qing
    Yang, Xibei
    Sun, Qiguo
    Wang, Pingxin
    Wang, Xun
    Xu, Taihua
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 5549 - 5561
  • [6] MVC-NET: MULTI-VIEW CHEST RADIOGRAPH CLASSIFICATION NETWORK WITH DEEP FUSION
    Zhu, Xiongfeng
    Feng, Qianjin
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 554 - 558
  • [7] DA-Net: Dual-attention network for multivariate time series classification
    Chen, Rongjun
    Yan, Xuanhui
    Wang, Shiping
    Xiao, Guobao
    INFORMATION SCIENCES, 2022, 610 : 472 - 487
  • [8] Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification
    Liang, Yunji
    Li, Huihui
    Guo, Bin
    Yu, Zhiwen
    Zheng, Xiaolong
    Samtani, Sagar
    Zeng, Daniel D.
    INFORMATION SCIENCES, 2021, 548 : 295 - 312
  • [9] Y Multi-view Multi-label Learning with Dual-Attention Networks for Stroke Screen
    Shen, Jundong
    Zhang, Yi
    Yu, Cheng
    Wang, Chongjun
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1124 - 1128
  • [10] MVF-Net: A Multi-View Fusion Network for Event-Based Object Classification
    Deng, Yongjian
    Chen, Hao
    Li, Youfu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8275 - 8284