Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection

被引:0
|
作者
Wang Y. [1 ]
Zhao G. [1 ]
Xiong K. [1 ]
Shi G. [1 ]
Zhang Y. [1 ]
机构
[1] School of Artificial Intelligence, Xidian University, Xi'an, 710071, Shaanxi
来源
Neurocomputing | 2021年 / 421卷
关键词
Dilated convolution; Multi-Scale Fully Convolutional Networks; Single-Scale Fully Convolutional Networks; Sound Event Detection; Temporal dependencies;
D O I
10.1016/j.neucom.2020.09.038
中图分类号
学科分类号
摘要
Among various Sound Event Detection (SED) systems, Recurrent Neural Networks (RNN), such as long short-term memory unit and gated recurrent unit, is used to capture temporal dependencies, but it is confined in its length of temporal dependencies, resulting in a failure to model sound events with long duration. What's more, RNN is incapable to process datasets in parallel, leading to low efficiency and low industrial value. Given these shortcomings, we propose to use dilated convolution (and causal dilated convolution) to capture temporal dependencies, as its great ability to ensure high time resolution and obtain longer temporal dependencies under the filter size and the network depth unchanged. In addition, dilated convolution can be parallelized, so it has higher efficiency and industrial value. Based on this, we propose Single-Scale Fully Convolutional Networks (SS-FCN) composed of convolutional neural networks and dilated convolutional networks, with the former to provide frequency invariance and the later to capture temporal dependencies. With the help of dilated convolution to control the length of temporal dependencies, we observe SS-FCN modeling a single length of temporal dependencies achieves superior detection performance for finite kinds of events. For better performance, we propose Multi-Scale Fully Convolutional Networks (MS-FCN), in which the feature fusion module is introduced to capture long short-term dependencies by fusing features with different length of temporal dependencies. The proposed methods achieve competitive performance on three main datasets with higher efficiency. The results show that SED systems based on Fully Convolutional Networks have further research value and potential. © 2020 Elsevier B.V.
引用
收藏
页码:51 / 65
页数:14
相关论文
共 50 条
  • [1] Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection
    Wang, Yingbin
    Zhao, Guanghui
    Xiong, Kai
    Shi, Guangming
    Zhang, Yumeng
    NEUROCOMPUTING, 2021, 421 : 51 - 65
  • [2] A Multi-scale Pyramid of Fully Convolutional Networks for Automatic Cell Detection
    Gu, Jiang
    Zhu, Yichen
    Yang, Bohong
    Jia, Jingkai
    Wang, Juanjuan
    Yang, Jian
    Zhang, Wenqiang
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 633 - 636
  • [3] Multi-Scale Fully Convolutional Network for Face Detection in the Wild
    Bai, Yancheng
    Ghanem, Bernard
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2078 - 2087
  • [4] Cell detection in pathology and microscopy images with multi-scale fully convolutional neural networks
    Pan, Xipeng
    Yang, Dengxian
    Li, Lingqiao
    Liu, Zhenbing
    Yang, Huihua
    Cao, Zhiwei
    He, Yubei
    Ma, Zhen
    Chen, Yiyi
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (06): : 1721 - 1743
  • [5] Cell detection in pathology and microscopy images with multi-scale fully convolutional neural networks
    Xipeng Pan
    Dengxian Yang
    Lingqiao Li
    Zhenbing Liu
    Huihua Yang
    Zhiwei Cao
    Yubei He
    Zhen Ma
    Yiyi Chen
    World Wide Web, 2018, 21 : 1721 - 1743
  • [6] Single-scale renormalisation group improvement of multi-scale effective potentials
    Leonardo Chataignier
    Tomislav Prokopec
    Michael G. Schmidt
    Bogumiła Świeżewska
    Journal of High Energy Physics, 2018
  • [7] Single-scale renormalisation group improvement of multi-scale effective potentials
    Chataignier, Leonardo
    Prokopec, Tomislav
    Schmidt, Michael G.
    Swiezewska, Bogumila
    JOURNAL OF HIGH ENERGY PHYSICS, 2018, (03):
  • [8] MULTI-SCALE RECURRENT NEURAL NETWORK FOR SOUND EVENT DETECTION
    Lu, Rui
    Duan, Zhiyao
    Zhang, Changshui
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 131 - 135
  • [9] Fully convolutional multi-scale dense networks for monocular depth estimation
    Liu, Jiwei
    Zhang, Yunzhou
    Cui, Jiahua
    Feng, Yonghui
    Pang, Linzhuo
    IET COMPUTER VISION, 2019, 13 (05) : 515 - 522
  • [10] Salient Object Detection with Chained Multi-Scale Fully Convolutional Network
    Tang, Youbao
    Wu, Xiangqian
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 618 - 626