A Progressive Learning Approach for Sound Event Detection with Temporal and Spectral Features Fusion

被引:0
|
作者
Zhong, Yilin [1 ]
Fang, Zhaoer [1 ]
Wang, Jie [1 ]
Fan, Bo [1 ]
Peng, BangHuang [1 ]
机构
[1] BYD Auto Ind Co LTD, Auto Engn Res Inst, Shenzhen, Peoples R China
关键词
sound event detection; progressive learning; self-supervised learning;
D O I
10.1007/978-981-97-5594-3_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sound Event Detection (SED) has wide applications for real world systems, including automatic surveillance, smart home devices, and intelligent automobiles. While recent works in SED have achieved significant performance improvements through fine-tuning pre-trained frame-wise audio tagging (AT) models, bridging the gap between AT and SED tasks, a common limitation is their exclusive reliance on spectral features for input. This leads to a challenge for precise sound event localization. To address this issue, we proposed a novel Temporal Mask Model (TMM) extracting temporal features, integrated with the Bidirectional Encoder representation from Audio Transformers and CNN (BEATs-CNN) framework which extracts spectral features. These two types of features are fused with a progressive learning strategy, and consequently fed into a Bidirectional Gated Recurrent Unit (Bi-GRU) to generate predictions. Through extensive experimentation, we demonstrate that our approach surpasses the reported State-Of-The-Art (SOTA) model in Polyphonic Sound Detection Score-scenario1 (PSDS1) and achieves a comparable result in Polyphonic Sound Detection Score-scenario(2) (PSDS2) on the DCASE Challenge Task 4.
引用
收藏
页码:207 / 218
页数:12
相关论文
共 50 条
  • [41] Learning Latent Temporal Structure for Complex Event Detection
    Tang, Kevin
    Li Fei-Fei
    Koller, Daphne
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1250 - 1257
  • [42] Monkeypox Detection and Classification Using Deep Learning Based Features Selection and Fusion Approach
    Maqsood, Sarmad
    Damagevieius, Robertas
    2023 IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON, 2023,
  • [43] Sound Event Detection Using Derivative Features in Deep Neural Networks
    Kwak, Jin-Yeol
    Chung, Yong-Joo
    APPLIED SCIENCES-BASEL, 2020, 10 (14):
  • [44] Active Few-Shot Learning for Sound Event Detection
    Wang, Yu
    Cartwright, Mark
    Bello, Juan Pablo
    INTERSPEECH 2022, 2022, : 1551 - 1555
  • [45] Sound learning–based event detection for acoustic surveillance sensors
    Jeong-Sik Park
    Seok-Hoon Kim
    Multimedia Tools and Applications, 2020, 79 : 16127 - 16139
  • [46] Self-Supervised Representation Learning and Temporal-Spectral Feature Fusion for Bed Occupancy Detection
    Song, Yingjian
    Pitafi, Zaid Farooq
    Dou, Fei
    Sun, Jin
    Zhang, Xiang
    Phillips, Bradley G.
    Song, Wenzhan
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2024, 8 (03):
  • [47] COUPLE LEARNING FOR SEMI-SUPERVISED SOUND EVENT DETECTION
    Tao, Rui
    Yan, Long
    Ouchi, Kazushige
    Wang, Xiangdong
    INTERSPEECH 2022, 2022, : 2398 - 2402
  • [48] Audiovisual transfer learning for audio tagging and sound event detection
    Boes, Wim
    Van Hamme, Hugo
    INTERSPEECH 2021, 2021, : 2401 - 2405
  • [49] A Temporal Dependency Based Multi-modal Active Learning Approach for Audiovisual Event Detection
    Thiam, Patrick
    Meudt, Sascha
    Palm, Guenther
    Schwenker, Friedhelm
    NEURAL PROCESSING LETTERS, 2018, 48 (02) : 709 - 732
  • [50] A Temporal Dependency Based Multi-modal Active Learning Approach for Audiovisual Event Detection
    Patrick Thiam
    Sascha Meudt
    Günther Palm
    Friedhelm Schwenker
    Neural Processing Letters, 2018, 48 : 709 - 732