A Progressive Learning Approach for Sound Event Detection with Temporal and Spectral Features Fusion

被引:0
|
作者
Zhong, Yilin [1 ]
Fang, Zhaoer [1 ]
Wang, Jie [1 ]
Fan, Bo [1 ]
Peng, BangHuang [1 ]
机构
[1] BYD Auto Ind Co LTD, Auto Engn Res Inst, Shenzhen, Peoples R China
关键词
sound event detection; progressive learning; self-supervised learning;
D O I
10.1007/978-981-97-5594-3_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sound Event Detection (SED) has wide applications for real world systems, including automatic surveillance, smart home devices, and intelligent automobiles. While recent works in SED have achieved significant performance improvements through fine-tuning pre-trained frame-wise audio tagging (AT) models, bridging the gap between AT and SED tasks, a common limitation is their exclusive reliance on spectral features for input. This leads to a challenge for precise sound event localization. To address this issue, we proposed a novel Temporal Mask Model (TMM) extracting temporal features, integrated with the Bidirectional Encoder representation from Audio Transformers and CNN (BEATs-CNN) framework which extracts spectral features. These two types of features are fused with a progressive learning strategy, and consequently fed into a Bidirectional Gated Recurrent Unit (Bi-GRU) to generate predictions. Through extensive experimentation, we demonstrate that our approach surpasses the reported State-Of-The-Art (SOTA) model in Polyphonic Sound Detection Score-scenario1 (PSDS1) and achieves a comparable result in Polyphonic Sound Detection Score-scenario(2) (PSDS2) on the DCASE Challenge Task 4.
引用
收藏
页码:207 / 218
页数:12
相关论文
共 50 条
  • [31] A Novel Approach based on Spectral-Temporal Information Fusion for Multi-Target Detection
    Zhang, Guoliang
    Yang, Chunling
    Zhang, Yan
    Jiao, Yang
    PROCEEDINGS OF THE 2016 IEEE 11TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2016, : 1661 - 1665
  • [32] A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning
    Liang, Han
    Ji, Wanting
    Wang, Ruili
    Ma, Yaxiong
    Chen, Jincai
    Chen, Min
    IEEE SENSORS JOURNAL, 2022, 22 (18) : 17483 - 17489
  • [33] Temporal video segmentation by event detection: A novelty detection approach
    Krishna M.V.
    Bodesheim P.
    Körner M.
    Denzler J.
    Pattern Recognition and Image Analysis, 2014, 24 (02) : 243 - 255
  • [34] Multi-granularity acoustic information fusion for sound event detection
    Yin, Han
    Chen, Jianfeng
    Bai, Jisheng
    Wang, Mou
    Rahardja, Susanto
    Shi, Dongyuan
    Gan, Woon-seng
    SIGNAL PROCESSING, 2025, 227
  • [35] POLYPHONIC SOUND EVENT AND SOUND ACTIVITY DETECTION: A MULTI-TASK APPROACH
    Pankajakshan, Arjun
    Bear, Helen L.
    Benetos, Emmanouil
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 323 - 327
  • [36] Temporal Coding with Magnitude-Phase Regularization for Sound Event Detection
    Park, Sangwook
    Kothinti, Sandeep
    Elhilali, Mounya
    INTERSPEECH 2022, 2022, : 1536 - 1540
  • [37] Decoupling Temporal Convolutional Networks Model in Sound Event Detection and Localization
    Song, Shen
    Zhang, Cong
    You, Xinyuan
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (01): : 89 - 99
  • [38] On Local Temporal Embedding for Semi-Supervised Sound Event Detection
    Gao, Lijian
    Mao, Qirong
    Dong, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1687 - 1698
  • [39] Hierarchical Temporal Attention and Competent Teacher Network for Sound Event Detection
    Zhang, Yihang
    Liang, Yun
    Weng, Shitong
    Lin, Hai
    Chen, Liping
    Zheng, Shenlong
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [40] Learning Temporal Alignment Uncertainty for Efficient Event Detection
    Abbasnejad, Iman
    Sridharan, Sridha
    Denman, Simon
    Fookes, Clinton
    Lucey, Simon
    2015 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2015, : 468 - 475