Representing Nonspeech Audio Signals through Speech Classification Models

被引：0

作者：

Phan, Huy ^{[1
,2
]}

Hertel, Lars ^{[1
]}

Maass, Marco ^{[1
]}

Mazur, Radoslaw ^{[1
]}

Mertins, Alfred ^{[1
]}

机构：

[1] Univ Lubeck, Inst Signal Proc, Lubeck, Germany

[2] Univ Lubeck, Grad Sch Comp Med & Life Sci, Lubeck, Germany

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

feature learning; audio event; speech model; TIME;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and different basis speech categories via a speech classification model. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. We conduct experiments for the audio event analysis application by using speech words from the TIMIT dataset to learn the descriptors for the audio events of the Freiburg-106 dataset. Our results on the event recognition task outperform those achieved by the best system even though a simple linear classifier is used. Furthermore, integrating the learned descriptors as an additional source leads to improved performance.

引用

页码：3441 / 3445

页数：5

共 50 条

[21] Intelligent preprocessing and classification of audio signals
Department of Mechanical Engineering, National Chiao-Tung University, Hsin-Chu 300, Taiwan
不详
AES J Audio Eng Soc, 2007, 5 (372-384):
[22] Robust noise reduction for speech and audio signals
Godsill, SJ
Rayner, PJW
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 625 - 628
[23] Classification of non-speech acoustic signals using structure models
Tschöpe, C
Hentschel, D
Wolff, M
Eichner, M
Hoffmann, R
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 653 - 656
[24] Representing speech through musical notation
Kassler, JC
JOURNAL OF MUSICOLOGICAL RESEARCH, 2005, 24 (3-4) : 227 - 239
[25] Speech, Nonspeech Audio, and Visual Interruptions of a Tracking Task: A Replication and Extension of Nees and Sampsell (2021)
Nees, Michael A.
Liu, Claire
Bogan, Krista
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2024, 72 (05): : 309 - 316
[26] Classification of Speech Signals through Ant Based Clustering of Time Series
Pancerz, Krzysztof
Lewicki, Arkadiusz
Tadeusiewicz, Ryszard
Szkola, Jaroslaw
COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT I, 2012, 7653 : 335 - 343
[27] AUDIO CLASSIFICATION OF MUSIC/SPEECH MIXED SIGNALS USING SINUSOIDAL MODELING WITH SVM AND NEURAL NETWORK APPROACH
Mowlaee, Pejman
Sayadiyan, Abolghasem
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2013, 22 (02)
[28] Speech/Music Classification of Short Audio Segments
Hirvonen, Toni
2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 135 - 138
[29] Background Sound Classification in Speech Audio Segments
Singh, Janvijay
Joshi, Raviraj
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
[30] Classification of audio signals using SVM and RBFNN
Dhanalakshmi, P.
Palanivel, S.
Ramalingam, V.
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6069 - 6075

← 1 2 3 4 5 →