DURATION ROBUST WEAKLY SUPERVISED SOUND EVENT DETECTION

被引:0
|
作者
Dinkel, Heinrich [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, SpeechLab, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
关键词
weakly supervised sound event detection; convolutional neural networks; recurrent neural networks; semi-supervised duration estimation;
D O I
10.1109/icassp40776.2020.9053459
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Task 4 of the DCASE2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection. Analyzing the challenge results it can be seen that most successful models are biased towards predicting long (e.g., over 5s) clips. This work aims to investigate the performance impact of fixed-sized window median filter post-processing and advocate the use of double thresholding as a more robust and predictable post-processing method. Further, four different temporal subsampling methods within the CRNN framework are proposed: mean-max, ff-mean-max, Lp-norm and convolutional. We show that for this task subsampling the temporal resolution by a neural network enhances the F1 score as well as its robustness towards short, sporadic sound events. Our best single model achieves 30.1% F1 on the evaluation set and the best fusion model 32:5%, while being robust to event length variations.
引用
收藏
页码:311 / 315
页数:5
相关论文
共 50 条
  • [21] Weakly Labeled Semi-Supervised Sound Event Detection Based on Convolutional Independent Recurrent Neural Networks
    Yu, Changgeng
    Yang, Dewang
    Liu, Xuanyu
    OPTICAL MEMORY AND NEURAL NETWORKS, 2022, 31 (03) : 266 - 276
  • [22] Weakly Labeled Semi-Supervised Sound Event Detection Based on Convolutional Independent Recurrent Neural Networks
    Dewang Changgeng Yu
    Xuanyu Yang
    Optical Memory and Neural Networks, 2022, 31 : 266 - 276
  • [23] A Joint Detection-Classification Model for Weakly Supervised Sound Event Detection Using Multi-Scale Attention Method
    Wang, Yaoguang
    He, Liang
    2020 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2020), 2020,
  • [24] Event-driven weakly supervised video anomaly detection
    Sun, Shengyang
    Gong, Xiaojin
    IMAGE AND VISION COMPUTING, 2024, 149
  • [25] Robust scream sound detection via sound event partitioning
    Baiying Lei
    Man-Wai Mak
    Multimedia Tools and Applications, 2016, 75 : 6071 - 6089
  • [26] Abnormal event detection by a weakly supervised temporal attention network
    Zheng, Xiangtao
    Zhang, Yichao
    Zheng, Yunpeng
    Luo, Fulin
    Lu, Xiaoqiang
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (03) : 419 - 431
  • [27] Robust scream sound detection via sound event partitioning
    Lei, Baiying
    Mak, Man-Wai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (11) : 6071 - 6089
  • [28] A FRAMEWORK FOR THE ROBUST EVALUATION OF SOUND EVENT DETECTION
    Bilen, Cagdas
    Ferroni, Giacomo
    Tuveri, Francesco
    Azcarreta, Juan
    Krstulovic, Sacha
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 61 - 65
  • [29] IMPACT OF SOUND DURATION AND INACTIVE FRAMES ON SOUND EVENT DETECTION PERFORMANCE
    Imoto, Keisuke
    Mishima, Sakiko
    Arai, Yumi
    Kondo, Reishi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 860 - 864
  • [30] Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection
    Lei, Baiying
    Mak, Man-Wai
    2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 389 - 394