AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引：35

作者：

Greco, Antonio ^{[1
]}

Petkov, Nicolai ^{[2
]}

Saggese, Alessia ^{[1
]}

Vento, Mario ^{[1
]}

机构：

[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy

[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

关键词：

Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;

D O I：

10.1109/TIFS.2020.2994740

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.

引用

页码：3610 / 3624

页数：15

共 50 条

[31] Mathematical representation of emotion using multimodal recognition model with deep multitask learning
Harata S.
Sakuma T.
Kato S.
Harata, Seiichi (harata@katolab.nitech.ac.jp), 1600, Institute of Electrical Engineers of Japan (140): : 1343 - 1351
[32] Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding
Pearson, Martin J.
Dora, Shirin
Struckmeier, Oliver
Knowles, Thomas C.
Mitchinson, Ben
Tiwari, Kshitij
Kyrki, Ville
Bohte, Sander
Pennartz, Cyriel M. A.
FRONTIERS IN ROBOTICS AND AI, 2021, 8
[33] Occupant behavior monitoring and emergency event detection in single-person households using deep learning-based sound recognition
Kim, Jinwoo
Min, Kyungjun
Jung, Minhyuk
Chi, Seokho
BUILDING AND ENVIRONMENT, 2020, 181
[34] Heart Sound Recognition Technology Based on Deep Learning
Huai, Ximing
Panote, Siriaraya
Choi, Dongeun
Kuwahara, Noriaki
DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT. POSTURE, MOTION AND HEALTH, DHM 2020, PT I, 2020, 12198 : 491 - 500
[35] A deep learning approach for predicting critical events using event logs
Huang, Congfang
Deep, Akash
Zhou, Shiyu
Veeramani, Dharmaraj
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2021, 37 (05) : 2214 - 2234
[36] Enhanced Local Feature Approach for Overlapping Sound Event Recognition
Dennis, Jonathan
Huy Dat Tran
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[37] Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models
Ganvir, Sonal
Lal, Nidhi
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 595 - 601
[38] Deep Unsupervised Representation Learning for Abnormal Heart Sound Classification
Amiriparian, Shahin
Schmitt, Maximilian
Cummins, Nicholas
Qian, Kun
Dong, Fengquan
Schuller, Bjoern
2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 4776 - 4779
[39] Event Recognition Based on Deep Learning in Chinese Texts
Zhang, Yajun
Liu, Zongtian
Zhou, Wen
PLOS ONE, 2016, 11 (08):
[40] Deep Representation Learning With Feature Augmentation for Face Recognition
Sun, Jie
Lu, Shengli
Pang, Wei
Sun, Zhilin
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 171 - 175

← 1 2 3 4 5 →