A research for sound event localization and detection based on local-global adaptive fusion and temporal importance network

被引:0
|
作者
Shi, Di [1 ]
Guo, Min [1 ]
Ma, Miao [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Key Lab Modern Teaching Technol, Minist Educ, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; Transformer block; Enhanced axial cross attention; Adaptive fusion module; Positional attention temporal context; CONVOLUTIONAL NEURAL-NETWORKS; PARAMETERS;
D O I
10.1007/s00530-024-01582-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sound event localization and detection systems can provide intelligent sound processing and analysis functions for various application devices. However, existing deep learning-based networks mostly rely on simple concatenation of convolutional neural networks (CNN) and recurrent neural networks, which leads to the loss of key feature information in audio. As a result, accurate localization and detection become more difficult. In this paper, we propose a local-global adaptive fusion and temporal importance network model. Firstly, the CNN block and the multi-scale enhanced axial cross attention Transformer block are used to learn the local and global features respectively. Then, the local and global features are effectively fused through the adaptive fusion module. Finally, the positional attention temporal context module is used to explore the positional information in the sound temporal sequence, capturing the important features. Experimental results on the Sony-TAu Reality Spatial Soundscapes 2022 dataset and the synthetic dataset show that the ER20 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ER_{20<^>{\circ }}$$\end{document} and LECD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$LE_{CD}$$\end{document} of the proposed model are reduced to 0.65 and 22.3 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{\circ }$$\end{document}, respectively, and the F20 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{20<^>{\circ }}$$\end{document} and LRCD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$LR_{CD}$$\end{document} are increased to 31.1% and 54.8%, respectively, and the comprehensive evaluation metric, SELDscore\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SELD\ score$$\end{document}, is reduced to 0.48, which achieves better performance compared with other models.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Local-Global Feature Adaptive Fusion Network for Building Crack Detection
    He, Yibin
    Yuan, Zhengrong
    Xia, Xinhong
    Yang, Bo
    Wu, Huiting
    Fu, Wei
    Yao, Wenxuan
    SENSORS, 2024, 24 (21)
  • [2] LGTCN: A Spatial-Temporal Traffic Flow Prediction Model Based on Local-Global Feature Fusion Temporal Convolutional Network
    Ye, Wei
    Kuang, Haoxuan
    Deng, Kunxiang
    Zhang, Dongran
    Li, Jun
    APPLIED SCIENCES-BASEL, 2024, 14 (19):
  • [3] Infrared Dim and Small Target Detection Based on Local-Global Feature Fusion
    Ling, Xiao
    Zhang, Chuan
    Yan, Zhijun
    Wang, Bo
    Sheng, Qinghong
    Li, Jun
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [4] Local-global feature fusion network for hyperspectral image classification
    Gan, Yuquan
    Zhang, Hao
    Liu, Weihua
    Ma, Jieming
    Luo, Yiming
    Pan, Yushan
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (22) : 8548 - 8575
  • [5] Local-Global Fusion Network for Video Super-Resolution
    Su, Dewei
    Wang, Hua
    Jin, Longcun
    Sun, Xianfang
    Peng, Xinyi
    IEEE ACCESS, 2020, 8 : 172443 - 172456
  • [6] Local-Global Transformer Neural Network for temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Tang, Xianglong
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 615 - 626
  • [7] Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet
    Zhang S.
    Zhang Y.
    Liao Y.
    Pang K.
    Wan Z.
    Zhou S.
    Mathematical Biosciences and Engineering, 2024, 21 (02) : 2004 - 2023
  • [8] Temporal Context Modeling Network with Local-Global Complementary Architecture for Temporal Proposal Generation
    Yuan, Yunfeng
    Yang, Wenzhu
    Luo, Zifei
    Gou, Ruru
    ELECTRONICS, 2022, 11 (17)
  • [9] Infrared Small Target Detection via Local-Global Feature Fusion
    Wu, Lang
    Ma, Yong
    Fan, Fan
    Huang, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 466 - 470
  • [10] CONNECTIONIST TEMPORAL LOCALIZATION FOR SOUND EVENT DETECTION WITH SEQUENTIAL LABELING
    Wang, Yun
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 745 - 749