Representing Nonspeech Audio Signals through Speech Classification Models

被引：0

作者：

Phan, Huy ^{[1
,2
]}

Hertel, Lars ^{[1
]}

Maass, Marco ^{[1
]}

Mazur, Radoslaw ^{[1
]}

Mertins, Alfred ^{[1
]}

机构：

[1] Univ Lubeck, Inst Signal Proc, Lubeck, Germany

[2] Univ Lubeck, Grad Sch Comp Med & Life Sci, Lubeck, Germany

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

feature learning; audio event; speech model; TIME;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and different basis speech categories via a speech classification model. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. We conduct experiments for the audio event analysis application by using speech words from the TIMIT dataset to learn the descriptors for the audio events of the Freiburg-106 dataset. Our results on the event recognition task outperform those achieved by the best system even though a simple linear classifier is used. Furthermore, integrating the learned descriptors as an additional source leads to improved performance.

引用

页码：3441 / 3445

页数：5

共 50 条

[1] An Improved Speech Nonspeech Classification Based on Feature Combination for Audio Indexing
Keum, Ji-Soo
Lee, Hyon-Soo
Hagiwara, Masafumi
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2010, E93A (04) : 830 - 832
[2] Speech vs nonspeech segmentation of audio signals using support vector machines
Danisman, Taner
Alpkocak, Adil
2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 854 - 857
[3] Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns
Huy Phan
Hertel, Lars
Maass, Marco
Mazur, Radoslaw
Mertins, Alfred
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 807 - 822
[4] Aided growth of masking for speech and nonspeech signals
Fortune, T
EAR AND HEARING, 1999, 20 (03): : 214 - 227
[5] MODES OF PROCESSING SPEECH AND NONSPEECH SIGNALS - COMMENT
PISONI, DB
MODULARITY AND THE MOTOR THEORY OF SPEECH PERCEPTION: PROCEEDINGS OF A CONFERENCE TO HONOR ALVIN L LIBERMAN, 1991, : 225 - 238
[6] INTERHEMISPHERIC PROCESSING OF SPEECH AND NONSPEECH SIGNALS BY STUTTERERS
PETERS, R
FOLIA PHONIATRICA, 1974, 26 (03): : 215 - 216
[7] Privacy-Sensitive Audio Features for Speech/Nonspeech Detection
Parthasarathi, Sree Hari Krishnan
Gatica-Perez, Daniel
Bourlard, Herve
Magimai-Doss, Mathew
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2538 - 2551
[8] Complementary models for audio-visual speech classification
Gonzalo D. Sad
Lucas D. Terissi
Juan C. Gómez
International Journal of Speech Technology, 2022, 25 : 231 - 249
[9] Complementary models for audio-visual speech classification
Sad, Gonzalo D.
Terissi, Lucas D.
Gomez, Juan C.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 231 - 249
[10] Pattern classification models for classifying and indexing audio signals
Dhanalakshmi, P.
Palanivel, S.
Ramalingam, V.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (02) : 350 - 357

← 1 2 3 4 5 →