AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引：35

作者：

Greco, Antonio ^{[1
]}

Petkov, Nicolai ^{[2
]}

Saggese, Alessia ^{[1
]}

Vento, Mario ^{[1
]}

机构：

[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy

[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

关键词：

Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;

D O I：

10.1109/TIFS.2020.2994740

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.

引用

页码：3610 / 3624

页数：15

共 50 条

[41] Representation learning in a deep network for license plate recognition
Sajed Rakhshani
Esmat Rashedi
Hossein Nezamabadi-pour
Multimedia Tools and Applications, 2020, 79 : 13267 - 13289
[42] UniformFace: Learning Deep Equidistributed Representation for Face Recognition
Duan, Yueqi
Lu, Jiwen
Zhou, Jie
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3410 - 3419
[43] Towards Universal Representation Learning for Deep Face Recognition
Shi, Yichun
Yu, Xiang
Sohn, Kihyuk
Chandraker, Manmohan
Jain, Anil K.
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6816 - 6825
[44] Survey of Deep Representation Learning for Speech Emotion Recognition
Latif, Siddique
Rana, Rajib
Khalifa, Sara
Jurdak, Raja
Qadir, Junaid
Schuller, Bjorn
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1634 - 1654
[45] Representation learning in a deep network for license plate recognition
Rakhshani, Sajed
Rashedi, Esmat
Nezamabadi-pour, Hossein
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (19-20) : 13267 - 13289
[46] Deep learning recognition of diseased and normal cell representation
Iqbal, Muhammad Shahid
Ahmad, Iftikhar
Bin, Luo
Khan, Suleman
Rodrigues, Joel J. P. C.
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2021, 32 (07):
[47] Action Recognition in the Dark via Deep Representation Learning
Ulhaq, Anwaar
2018 IEEE THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, APPLICATIONS AND SYSTEMS (IPAS), 2018, : 131 - 136
[48] Deep Learning Approach for Receipt Recognition
Anh Duc Le
Dung Van Pham
Tuan Anh Nguyen
FUTURE DATA AND SECURITY ENGINEERING (FDSE 2019), 2019, 11814 : 705 - 712
[49] A Deep Learning approach for Modulation Recognition
Zhang, Yu
Liu, Tong
Zhang, Linbo
Wang, Kan
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[50] A deep learning approach for speaker recognition
Hourri, Soufiane
Kharroubi, Jamal
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 123 - 131

← 1 2 3 4 5 →