Real-Time monophonic and polyphonic audio classification from power spectra

被引：7

作者：

Baelde, Maxime ^{[1
,2
]}

Biernacki, Christophe ^{[2
]}

Greff, Raphael ^{[1
]}

机构：

[1] A Volute, 19 Rue Ladrie, F-59491 Villeneuve Dascq, France

[2] Univ Lille, INRIA, Modal team, CNRS,UMR 8524,Lab Paul Painleve, F-59000 Lille, France

来源：

PATTERN RECOGNITION | 2019年 / 92卷

关键词：

Real-time; Audio classification; Machine learning; Monophonic; Polyphonic; Generative model; Nonparametric estimation; MODEL;

D O I：

10.1016/j.patcog.2019.03.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：82 / 92

页数：11

共 50 条

[21] A Real-Time Audio Upmixing Method from Stereo to 7.1-Channel Audio
Chun, Chan Jun
Lee, Young Han
Kim, Yong Guk
Kim, Hong Kook
Cho, Choong Sang
COMMUNICATION AND NETWORKING, PT II, 2010, 120 : 162 - +
[22] Real-Time Polyphonic Pitch Detection on Acoustic Musical Signals
Goodman, Thomas A.
Batten, Ian
2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 656 - 661
[23] Robust feature extraction and classification of EEG spectra for real-time classification of cognitive state
Wallerius, J
Trejo, LJ
Matthews, R
Rosipal, R
Caldwell, JA
Foundations of Augmented Cognition, Vol 11, 2005, : 302 - 311
[24] Real-time system for automatic classification of power quality disturbances
Ribeiro, E. G.
Dias, G. L.
Barbosa, B. H. G.
Ferreira, D. D.
PROCEEDINGS OF 2016 17TH INTERNATIONAL CONFERENCE ON HARMONICS AND QUALITY OF POWER (ICHQP), 2016, : 908 - 913
[25] A Novel Cascaded Approach for Classification of Tuberculosis Using Cough Audio in Real-Time Environment
Mahmood, Haroon
Iftikhar, Manal
Wali, Aamir
Ali, Arshad
Gulzar, Maryam
IEEE ACCESS, 2024, 12 : 191980 - 191993
[26] Score-Informed Source Separation Based on Real-time Polyphonic Score-to-Audio Alignment and Bayesian Harmonic Model
Cai, Juanjuan
Guo, Yiyun
Wang, Hui
Wang, Ying
2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 672 - 680
[27] Real-Time Audio Similarity Comparison Algorithm
Jaiyen, Nantawat
Hantula, Panya
Tongta, Rangsan
PROCEEDINGS OF THE 2017 IEEE 15TH STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2017, : 477 - 480
[28] Audio real-time processing for multimedia computer
Zhang, Chengyun
Xie, Zhiwen
Xie, Bosun
Diansheng Jishu/Audio Engineering, 2000, (01): : 19 - 21
[29] AudioWiz: Nearly Real-time Audio Transcriptions
White, Samuel
ASSETS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2010, : 307 - 308
[30] Real-time audio watermarking system prototype
Hernandez, Jose Juan Garcia
Miyatake, Mariko Nakano
Meana, Hector Perez
ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 792 - +

← 1 2 3 4 5 →