Sound Event Localization and Detection Based on Deep Learning

被引:0
|
作者
Zhao, Dada [1 ,2 ]
Ding, Kai [2 ]
Qi, Xiaogang [1 ]
Chen, Yu [2 ]
Feng, Hailin [1 ]
机构
[1] Xidian Univ, Sch Math & Stat, Xian 710071, Peoples R China
[2] Sci & Technol Near Surface Detect Lab, Wuxi 214035, Peoples R China
基金
中国国家自然科学基金;
关键词
Location awareness; Feature extraction; Neural networks; Convolutional neural networks; Reverberation; Prediction algorithms; Training; sound event localization and detection (SELD); deep learning; convolutional recursive neural network (CRNN); channel attention mechanism; DATA AUGMENTATION; NEURAL-NETWORKS; SPECTRUM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Acoustic source localization (ASL) and sound event detection (SED) are two widely pursued independent research fields. In recent years, in order to achieve a more complete spatial and temporal representation of sound field, sound event localization and detection (SELD) has become a very active research topic. This paper presents a deep learning-based multi-overlapping sound event localization and detection algorithm in three-dimensional space. Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features. These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively. The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features. Finally, a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm. Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.
引用
收藏
页码:294 / 301
页数:8
相关论文
共 50 条
  • [31] Abnormal event detection in crowded scenes based on deep learning
    Fang, Zhijun
    Fei, Fengchang
    Fang, Yuming
    Lee, Changhoon
    Xiong, Naixue
    Shu, Lei
    Chen, Sheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (22) : 14617 - 14639
  • [32] Frequency Disturbance Event Detection Based on Synchrophasors and Deep Learning
    Wang, Weikang
    Yin, He
    Chen, Chang
    Till, Abigail
    Yao, Wenxuan
    Deng, Xianda
    Liu, Yilu
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (04) : 3593 - 3605
  • [33] Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection
    Chen, Baoqing
    Wang, Mei
    Gu, Yu
    SENSORS, 2024, 24 (18)
  • [34] AN IMPROVED EVENT-INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Gao, Yin
    Iqbal, Turab
    Kong, Qiuqiang
    An, Fengyan
    Wang, Wenwu
    Plumbley, Mark D.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 885 - 889
  • [35] Personality Trait Detection Based on ASM Localization and Deep Learning
    Fu, JinFeng
    Zhang, Hongli
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [36] CONNECTIONIST TEMPORAL LOCALIZATION FOR SOUND EVENT DETECTION WITH SEQUENTIAL LABELING
    Wang, Yun
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 745 - 749
  • [37] Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
    Politis, Archontis
    Mesaros, Annamaria
    Adavanne, Sharath
    Heittola, Toni
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 684 - 698
  • [38] A SEQUENCE MATCHING NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Thi Ngoc Tho Nguyen
    Jones, Douglas L.
    Gan, Woon-Seng
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 71 - 75
  • [39] Joining Sound Event Detection and Localization Through Spatial Segregation
    Trowitzsch, Ivo
    Schymura, Christopher
    Kolossa, Dorothea
    Obermayer, Klaus
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 487 - 502
  • [40] Sound source localization using deep learning models
    Yalta N.
    Nakadai K.
    Ogata T.
    2017, Fuji Technology Press (29) : 37 - 48