DURATION ROBUST WEAKLY SUPERVISED SOUND EVENT DETECTION

被引:0
|
作者
Dinkel, Heinrich [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, SpeechLab, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
关键词
weakly supervised sound event detection; convolutional neural networks; recurrent neural networks; semi-supervised duration estimation;
D O I
10.1109/icassp40776.2020.9053459
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Task 4 of the DCASE2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection. Analyzing the challenge results it can be seen that most successful models are biased towards predicting long (e.g., over 5s) clips. This work aims to investigate the performance impact of fixed-sized window median filter post-processing and advocate the use of double thresholding as a more robust and predictable post-processing method. Further, four different temporal subsampling methods within the CRNN framework are proposed: mean-max, ff-mean-max, Lp-norm and convolutional. We show that for this task subsampling the temporal resolution by a neural network enhances the F1 score as well as its robustness towards short, sporadic sound events. Our best single model achieves 30.1% F1 on the evaluation set and the best fusion model 32:5%, while being robust to event length variations.
引用
收藏
页码:311 / 315
页数:5
相关论文
共 50 条
  • [41] Robust Sound Event Detection in Continuous Audio Environments
    Zhang, Haomin
    McLoughlin, Ian
    Song, Yan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2977 - 2981
  • [42] Robust sound event detection in bioacoustic sensor networks
    Lostanlen, Vincent
    Salamon, Justin
    Farnsworth, Andrew
    Kelling, Steve
    Bello, Juan Pablo
    PLOS ONE, 2019, 14 (10):
  • [43] Robust fall detection in video surveillance based on weakly supervised learning
    Wu, Lian
    Huang, Chao
    Zhao, Shuping
    Li, Jinkai
    Zhao, Jianchuan
    Cui, Zhongwei
    Yu, Zhen
    Xu, Yong
    Zhang, Min
    NEURAL NETWORKS, 2023, 163 : 286 - 297
  • [44] Duration-Controlled LSTM for Polyphonic Sound Event Detection
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Hori, Takaaki
    Le Roux, Jonathan
    Takeda, Kazuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2059 - 2070
  • [45] A study on the waveform-based end-to-end deep convolutional neural network for weakly supervised sound event detection
    Lee, Seokjin
    Kim, Minhan
    Jeong, Youngho
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (01): : 24 - 31
  • [46] TASK-AWARE MEAN TEACHER METHOD FOR LARGE SCALE WEAKLY LABELED SEMI-SUPERVISED SOUND EVENT DETECTION
    Yan, Jie
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 326 - 330
  • [47] Weakly supervised foreground learning for weakly supervised localization and detection
    Zhang, Chen -Lin
    Li, Yin
    Wu, Jianxin
    PATTERN RECOGNITION, 2023, 137
  • [48] Weakly labeled sound event detection with a capsule-transformer model
    Li, Kanghao
    Yang, Shuguo
    Zhao, Li
    Wang, Wenwu
    DIGITAL SIGNAL PROCESSING, 2024, 146
  • [49] Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset
    Park, Chungho
    Kim, Donghyeon
    Ko, Hanseok
    SENSORS, 2021, 21 (24)
  • [50] Adaptive Hierarchical Pooling forWeakly-supervised Sound Event Detection
    Gao, Lijian
    Zhou, Ling
    Mao, Qirong
    Dong, Ming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1779 - 1787