A research for sound event localization and detection based on local-global adaptive fusion and temporal importance network

被引:0
|
作者
Shi, Di [1 ]
Guo, Min [1 ]
Ma, Miao [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Key Lab Modern Teaching Technol, Minist Educ, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; Transformer block; Enhanced axial cross attention; Adaptive fusion module; Positional attention temporal context; CONVOLUTIONAL NEURAL-NETWORKS; PARAMETERS;
D O I
10.1007/s00530-024-01582-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sound event localization and detection systems can provide intelligent sound processing and analysis functions for various application devices. However, existing deep learning-based networks mostly rely on simple concatenation of convolutional neural networks (CNN) and recurrent neural networks, which leads to the loss of key feature information in audio. As a result, accurate localization and detection become more difficult. In this paper, we propose a local-global adaptive fusion and temporal importance network model. Firstly, the CNN block and the multi-scale enhanced axial cross attention Transformer block are used to learn the local and global features respectively. Then, the local and global features are effectively fused through the adaptive fusion module. Finally, the positional attention temporal context module is used to explore the positional information in the sound temporal sequence, capturing the important features. Experimental results on the Sony-TAu Reality Spatial Soundscapes 2022 dataset and the synthetic dataset show that the ER20 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ER_{20<^>{\circ }}$$\end{document} and LECD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$LE_{CD}$$\end{document} of the proposed model are reduced to 0.65 and 22.3 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{\circ }$$\end{document}, respectively, and the F20 degrees\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{20<^>{\circ }}$$\end{document} and LRCD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$LR_{CD}$$\end{document} are increased to 31.1% and 54.8%, respectively, and the comprehensive evaluation metric, SELDscore\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SELD\ score$$\end{document}, is reduced to 0.48, which achieves better performance compared with other models.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Local-global dynamic correlations based spatial-temporal convolutional network for traffic flow forecasting
    Zhang, Hong
    Gong, Lei
    Zhao, Tianxin
    Zhang, Xijun
    Wang, Hongyan
    High Technology Letters, 2024, 30 (04) : 370 - 379
  • [22] Local-global dynamic correlations based spatial-temporal convolutional network for traffic flow forecasting
    张红
    GONG Lei
    ZHAO Tianxin
    ZHANG Xijun
    WANG Hongyan
    High Technology Letters, 2024, 30 (04) : 370 - 379
  • [23] DETECTION OF COMPONENT VIBRATIONS IN REACTORS BASED ON THE LOCAL-GLOBAL INTERACTION
    ALAMMAR, A
    DANOFSKY, RA
    TRANSACTIONS OF THE AMERICAN NUCLEAR SOCIETY, 1981, 39 : 955 - 956
  • [24] Sound event localization and detection based on deep learning
    ZHAO Dada
    DING Kai
    QI Xiaogang
    CHEN Yu
    FENG Hailin
    JournalofSystemsEngineeringandElectronics, 2024, 35 (02) : 294 - 301
  • [25] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (02) : 294 - 301
  • [26] Sound Event Localization and Detection Based on Dual Attention
    Xu, Chundong
    Liu, Hao
    Min, Yuan
    Zhen, Yadi
    Computer Engineering and Applications, 2023, 59 (19) : 99 - 105
  • [27] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    Journal of Systems Engineering and Electronics, 2024, 35 (02) : 294 - 301
  • [28] Sound Event Detection Based on Bidirectional Temporal Convolutional Network and Gated Recurrent Unit
    Chen Yihan
    Guo Min
    Li Zhiqiang
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 445 - 450
  • [29] A multi-focus image fusion network with local-global joint attention module
    Zou, Xinheng
    Yang, You
    Zhai, Hao
    Jiang, Weiping
    Pan, Xin
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [30] Local-Global Features Fusion Network for Distinguishing Radiolucent Jaw Lesions via CBCT
    Wang, Yuan
    Chen, Hua
    Cai, Zikang
    Mao, Liang
    Sun, Zhijun
    Liu, Juan
    Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024, 2024, : 2505 - 2510