AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [21] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    Journal of Systems Engineering and Electronics, 2024, 35 (02) : 294 - 301
  • [22] Learning Deep Representation for Place Recognition in SLAM
    Mukherjee, Aritra
    Chakraborty, Satyaki
    Saha, Sanjoy Kumar
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 557 - 564
  • [23] Deep Learning Based Representation for Face Recognition
    Prasad, Puja S.
    Pathak, Rashmi
    Gunjan, Vinit Kumar
    Rao, H. V. Ramana
    ICCCE 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND CYBER-PHYSICAL ENGINEERING, 2020, 570 : 419 - 424
  • [24] Dictionary Learning Inspired Deep Network for Scene Recognition
    Liu, Yang
    Chen, Qingchao
    Chen, Wei
    Wassell, Ian
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7178 - 7185
  • [25] Brain-computer interface-based target recognition system using transfer learning: A deep learning approach
    Chen, Ning
    Zhang, Yimeng
    Wu, Jielong
    Zhang, Hongyi
    Chamola, Vinay
    Albuquerque, Victor Hugo C.
    COMPUTATIONAL INTELLIGENCE, 2022, 38 (01) : 139 - 155
  • [26] Click-event sound detection in automotive industry using machine/deep learning
    Espinosa, Ricardo
    Ponce, Hiram
    Gutierrez, Sebastian
    APPLIED SOFT COMPUTING, 2021, 108
  • [27] Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning
    Hertel, Lars
    Huy Phan
    Mertins, Alfred
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3407 - 3411
  • [28] A Classifier Approach using Deep Learning for Human Activity Recognition
    Rawat, Sarthak Singh
    Bisht, Abhishek
    Nijhawan, Rahul
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 486 - 490
  • [29] Optical Character Recognition using Deep Learning: An enhanced Approach
    Amara, Marwa
    Zaghdoud, Radhia
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 545 - 552
  • [30] Improved Deep Representation Learning for Human Activity Recognition using IMU Sensors
    Lyons, Niall
    Santra, Avik
    Pandey, Ashutosh
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 326 - 332