AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [31] Mathematical representation of emotion using multimodal recognition model with deep multitask learning
    Harata S.
    Sakuma T.
    Kato S.
    Harata, Seiichi (harata@katolab.nitech.ac.jp), 1600, Institute of Electrical Engineers of Japan (140): : 1343 - 1351
  • [32] Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding
    Pearson, Martin J.
    Dora, Shirin
    Struckmeier, Oliver
    Knowles, Thomas C.
    Mitchinson, Ben
    Tiwari, Kshitij
    Kyrki, Ville
    Bohte, Sander
    Pennartz, Cyriel M. A.
    FRONTIERS IN ROBOTICS AND AI, 2021, 8
  • [33] Occupant behavior monitoring and emergency event detection in single-person households using deep learning-based sound recognition
    Kim, Jinwoo
    Min, Kyungjun
    Jung, Minhyuk
    Chi, Seokho
    BUILDING AND ENVIRONMENT, 2020, 181
  • [34] Heart Sound Recognition Technology Based on Deep Learning
    Huai, Ximing
    Panote, Siriaraya
    Choi, Dongeun
    Kuwahara, Noriaki
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT. POSTURE, MOTION AND HEALTH, DHM 2020, PT I, 2020, 12198 : 491 - 500
  • [35] A deep learning approach for predicting critical events using event logs
    Huang, Congfang
    Deep, Akash
    Zhou, Shiyu
    Veeramani, Dharmaraj
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2021, 37 (05) : 2214 - 2234
  • [36] Enhanced Local Feature Approach for Overlapping Sound Event Recognition
    Dennis, Jonathan
    Huy Dat Tran
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [37] Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models
    Ganvir, Sonal
    Lal, Nidhi
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 595 - 601
  • [38] Deep Unsupervised Representation Learning for Abnormal Heart Sound Classification
    Amiriparian, Shahin
    Schmitt, Maximilian
    Cummins, Nicholas
    Qian, Kun
    Dong, Fengquan
    Schuller, Bjoern
    2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 4776 - 4779
  • [39] Event Recognition Based on Deep Learning in Chinese Texts
    Zhang, Yajun
    Liu, Zongtian
    Zhou, Wen
    PLOS ONE, 2016, 11 (08):
  • [40] Deep Representation Learning With Feature Augmentation for Face Recognition
    Sun, Jie
    Lu, Shengli
    Pang, Wei
    Sun, Zhilin
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 171 - 175