Representing Nonspeech Audio Signals through Speech Classification Models

被引:0
|
作者
Phan, Huy [1 ,2 ]
Hertel, Lars [1 ]
Maass, Marco [1 ]
Mazur, Radoslaw [1 ]
Mertins, Alfred [1 ]
机构
[1] Univ Lubeck, Inst Signal Proc, Lubeck, Germany
[2] Univ Lubeck, Grad Sch Comp Med & Life Sci, Lubeck, Germany
关键词
feature learning; audio event; speech model; TIME;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and different basis speech categories via a speech classification model. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. We conduct experiments for the audio event analysis application by using speech words from the TIMIT dataset to learn the descriptors for the audio events of the Freiburg-106 dataset. Our results on the event recognition task outperform those achieved by the best system even though a simple linear classifier is used. Furthermore, integrating the learned descriptors as an additional source leads to improved performance.
引用
收藏
页码:3441 / 3445
页数:5
相关论文
共 50 条
  • [1] An Improved Speech Nonspeech Classification Based on Feature Combination for Audio Indexing
    Keum, Ji-Soo
    Lee, Hyon-Soo
    Hagiwara, Masafumi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2010, E93A (04) : 830 - 832
  • [2] Speech vs nonspeech segmentation of audio signals using support vector machines
    Danisman, Taner
    Alpkocak, Adil
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 854 - 857
  • [3] Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns
    Huy Phan
    Hertel, Lars
    Maass, Marco
    Mazur, Radoslaw
    Mertins, Alfred
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 807 - 822
  • [4] Aided growth of masking for speech and nonspeech signals
    Fortune, T
    EAR AND HEARING, 1999, 20 (03): : 214 - 227
  • [5] MODES OF PROCESSING SPEECH AND NONSPEECH SIGNALS - COMMENT
    PISONI, DB
    MODULARITY AND THE MOTOR THEORY OF SPEECH PERCEPTION: PROCEEDINGS OF A CONFERENCE TO HONOR ALVIN L LIBERMAN, 1991, : 225 - 238
  • [6] INTERHEMISPHERIC PROCESSING OF SPEECH AND NONSPEECH SIGNALS BY STUTTERERS
    PETERS, R
    FOLIA PHONIATRICA, 1974, 26 (03): : 215 - 216
  • [7] Privacy-Sensitive Audio Features for Speech/Nonspeech Detection
    Parthasarathi, Sree Hari Krishnan
    Gatica-Perez, Daniel
    Bourlard, Herve
    Magimai-Doss, Mathew
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2538 - 2551
  • [8] Complementary models for audio-visual speech classification
    Gonzalo D. Sad
    Lucas D. Terissi
    Juan C. Gómez
    International Journal of Speech Technology, 2022, 25 : 231 - 249
  • [9] Complementary models for audio-visual speech classification
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 231 - 249
  • [10] Pattern classification models for classifying and indexing audio signals
    Dhanalakshmi, P.
    Palanivel, S.
    Ramalingam, V.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (02) : 350 - 357