AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [11] Sound event classification using deep neural network based transfer learning
    Lim, Hyungjun
    Kim, Myung Jong
    Kim, Hoirin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (02): : 143 - 148
  • [12] Noise Robust Sound Event Detection Using Deep Learning and Audio Enhancement
    Wan, Tongtang
    Zhou, Yi
    Ma, Yongbao
    Liu, Hongqing
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [13] Deep Representation Learning for Multimodal Emotion Recognition Using Physiological Signals
    Zubair, Muhammad
    Woo, Sungpil
    Lim, Sunhwan
    Yoon, Changwoo
    IEEE ACCESS, 2024, 12 : 106605 - 106617
  • [14] Heart Sound Classification Using Multi Modal Data Representation and Deep Learning
    Lee, Jang Hyung
    Kyung, Sun Young
    Oh, Pyung Chun
    Kim, Kwang Gi
    Shin, Dong Jin
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (03) : 537 - 543
  • [15] Automated brain tumor recognition using equilibrium optimizer with deep learning approach on MRI images
    Ragab, Mahmoud
    Katib, Iyad
    Sharaf, Sanaa A.
    Alterazi, Hassan A.
    Subahi, Alanoud
    Alattas, Sana G.
    Binyamin, Sami Saeed
    Alyami, Jaber
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [16] An extended dictionary representation approach with deep subspace learning for facial expression recognition
    Sun, Zhe
    Chiong, Raymond
    Hu, Zheng-ping
    NEUROCOMPUTING, 2018, 316 : 1 - 9
  • [17] Using Deep Learning Approach in Flight Exceedance Event Analysis
    Shyur, Huan-Jyh
    Cheng, Chi-Bin
    Hsiao, Yu-Lin
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (06) : 1405 - 1418
  • [18] Sound event localization and detection based on deep learning
    ZHAO Dada
    DING Kai
    QI Xiaogang
    CHEN Yu
    FENG Hailin
    JournalofSystemsEngineeringandElectronics, 2024, 35 (02) : 294 - 301
  • [19] A survey of Deep Learning for Polyphonic Sound event detection
    Dang, An
    Vu, Toan H.
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
  • [20] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (02) : 294 - 301