AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引：35

作者：

Greco, Antonio ^{[1
]}

Petkov, Nicolai ^{[2
]}

Saggese, Alessia ^{[1
]}

Vento, Mario ^{[1
]}

机构：

[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy

[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

关键词：

Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;

D O I：

10.1109/TIFS.2020.2994740

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.

引用

页码：3610 / 3624

页数：15

共 50 条

[21] Sound Event Localization and Detection Based on Deep Learning
Zhao, Dada
Ding, Kai
Qi, Xiaogang
Chen, Yu
Feng, Hailin
Journal of Systems Engineering and Electronics, 2024, 35 (02) : 294 - 301
[22] Learning Deep Representation for Place Recognition in SLAM
Mukherjee, Aritra
Chakraborty, Satyaki
Saha, Sanjoy Kumar
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 557 - 564
[23] Deep Learning Based Representation for Face Recognition
Prasad, Puja S.
Pathak, Rashmi
Gunjan, Vinit Kumar
Rao, H. V. Ramana
ICCCE 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND CYBER-PHYSICAL ENGINEERING, 2020, 570 : 419 - 424
[24] Dictionary Learning Inspired Deep Network for Scene Recognition
Liu, Yang
Chen, Qingchao
Chen, Wei
Wassell, Ian
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7178 - 7185
[25] Brain-computer interface-based target recognition system using transfer learning: A deep learning approach
Chen, Ning
Zhang, Yimeng
Wu, Jielong
Zhang, Hongyi
Chamola, Vinay
Albuquerque, Victor Hugo C.
COMPUTATIONAL INTELLIGENCE, 2022, 38 (01) : 139 - 155
[26] Click-event sound detection in automotive industry using machine/deep learning
Espinosa, Ricardo
Ponce, Hiram
Gutierrez, Sebastian
APPLIED SOFT COMPUTING, 2021, 108
[27] Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning
Hertel, Lars
Huy Phan
Mertins, Alfred
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3407 - 3411
[28] A Classifier Approach using Deep Learning for Human Activity Recognition
Rawat, Sarthak Singh
Bisht, Abhishek
Nijhawan, Rahul
2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 486 - 490
[29] Optical Character Recognition using Deep Learning: An enhanced Approach
Amara, Marwa
Zaghdoud, Radhia
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 545 - 552
[30] Improved Deep Representation Learning for Human Activity Recognition using IMU Sensors
Lyons, Niall
Santra, Avik
Pandey, Ashutosh
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 326 - 332

← 1 2 3 4 5 →