AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [41] Representation learning in a deep network for license plate recognition
    Sajed Rakhshani
    Esmat Rashedi
    Hossein Nezamabadi-pour
    Multimedia Tools and Applications, 2020, 79 : 13267 - 13289
  • [42] UniformFace: Learning Deep Equidistributed Representation for Face Recognition
    Duan, Yueqi
    Lu, Jiwen
    Zhou, Jie
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3410 - 3419
  • [43] Towards Universal Representation Learning for Deep Face Recognition
    Shi, Yichun
    Yu, Xiang
    Sohn, Kihyuk
    Chandraker, Manmohan
    Jain, Anil K.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6816 - 6825
  • [44] Survey of Deep Representation Learning for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Qadir, Junaid
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1634 - 1654
  • [45] Representation learning in a deep network for license plate recognition
    Rakhshani, Sajed
    Rashedi, Esmat
    Nezamabadi-pour, Hossein
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (19-20) : 13267 - 13289
  • [46] Deep learning recognition of diseased and normal cell representation
    Iqbal, Muhammad Shahid
    Ahmad, Iftikhar
    Bin, Luo
    Khan, Suleman
    Rodrigues, Joel J. P. C.
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (07):
  • [47] Action Recognition in the Dark via Deep Representation Learning
    Ulhaq, Anwaar
    2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 131 - 136
  • [48] Deep Learning Approach for Receipt Recognition
    Anh Duc Le
    Dung Van Pham
    Tuan Anh Nguyen
    FUTURE DATA AND SECURITY ENGINEERING (FDSE 2019), 2019, 11814 : 705 - 712
  • [49] A Deep Learning approach for Modulation Recognition
    Zhang, Yu
    Liu, Tong
    Zhang, Linbo
    Wang, Kan
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [50] A deep learning approach for speaker recognition
    Hourri, Soufiane
    Kharroubi, Jamal
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 123 - 131