Real-Time monophonic and polyphonic audio classification from power spectra

被引：7

作者：

Baelde, Maxime ^{[1
,2
]}

Biernacki, Christophe ^{[2
]}

Greff, Raphael ^{[1
]}

机构：

[1] A Volute, 19 Rue Ladrie, F-59491 Villeneuve Dascq, France

[2] Univ Lille, INRIA, Modal team, CNRS,UMR 8524,Lab Paul Painleve, F-59000 Lille, France

来源：

PATTERN RECOGNITION | 2019年 / 92卷

关键词：

Real-time; Audio classification; Machine learning; Monophonic; Polyphonic; Generative model; Nonparametric estimation; MODEL;

D O I：

10.1016/j.patcog.2019.03.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：82 / 92

页数：11

共 50 条

[41] Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification
Kyriakos, Angelos
Papatheofanous, Elissaios-Alexios
Bezaitis, Charalampos
Reisis, Dionysios
JOURNAL OF IMAGING, 2022, 8 (04)
[42] A systematic review of real-time detection and classification of power quality disturbances
Joaquín E. Caicedo
Daniel Agudelo-Martínez
Edwin Rivas-Trujillo
Jan Meyer
Protection and Control of Modern Power Systems, 2023, 8
[43] Real-time voltage sag detection and classification for power quality diagnostics
Nagata, Erick A.
Ferreira, Danton D.
Bollen, Math H. J.
Barbosa, Bruno H. G.
Ribeiro, Eduardo G.
Duque, Carlos A.
Ribeiro, Paulo F.
MEASUREMENT, 2020, 164
[44] A systematic review of real-time detection and classification of power quality disturbances
Caicedo, Joaquin E.
Agudelo-Martinez, Daniel
Rivas-Trujillo, Edwin
Meyer, Jan
PROTECTION AND CONTROL OF MODERN POWER SYSTEMS, 2023, 8 (01)
[45] The power of real-time PCR
Valasek, MA
Repa, JJ
ADVANCES IN PHYSIOLOGY EDUCATION, 2005, 29 (03) : 151 - 159
[46] Real-Time Context Aware Audio Augmented Reality
Arvanitis, Gerasimos
Moustakas, Konstantinos
Fakotakis, Nikos
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 333 - 340
[47] COMPRESSION CHIP HANDLES REAL-TIME VIDEO AND AUDIO
LEONARD, M
ELECTRONIC DESIGN, 1990, 38 (23) : 43 - &
[48] Cycle saving hardware for real-time audio processing
Park, SW
Yoo, SK
Jeong, NH
Kim, JS
Ko, WS
Lee, KS
Youn, DH
ELECTRONICS LETTERS, 1998, 34 (09) : 847 - 848
[49] Implementation of Real-Time Audio Watermarking Based on DSP
Zhang, Qiuyu
Deng, Jiabin
Yuan, Zhanting
IITAW: 2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATIONS WORKSHOPS, 2009, : 145 - 148
[50] Load adaptive real-time audio playout algorithm
Tan, Yu'an
Ai, Benren
Cao, Yuan-Da
Zhang, Xue-Lan
Jisuanji Gongcheng/Computer Engineering, 2006, 32 (14): : 199 - 201

← 1 2 3 4 5 →