Representing Nonspeech Audio Signals through Speech Classification Models

被引:0
|
作者
Phan, Huy [1 ,2 ]
Hertel, Lars [1 ]
Maass, Marco [1 ]
Mazur, Radoslaw [1 ]
Mertins, Alfred [1 ]
机构
[1] Univ Lubeck, Inst Signal Proc, Lubeck, Germany
[2] Univ Lubeck, Grad Sch Comp Med & Life Sci, Lubeck, Germany
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
feature learning; audio event; speech model; TIME;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, such as in a recognition task. To find out how similar nonspeech signals are to speech, we measure the closeness between target nonspeech signals and different basis speech categories via a speech classification model. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. We conduct experiments for the audio event analysis application by using speech words from the TIMIT dataset to learn the descriptors for the audio events of the Freiburg-106 dataset. Our results on the event recognition task outperform those achieved by the best system even though a simple linear classifier is used. Furthermore, integrating the learned descriptors as an additional source leads to improved performance.
引用
收藏
页码:3441 / 3445
页数:5
相关论文
共 50 条
  • [21] Intelligent preprocessing and classification of audio signals
    Department of Mechanical Engineering, National Chiao-Tung University, Hsin-Chu 300, Taiwan
    不详
    AES J Audio Eng Soc, 2007, 5 (372-384):
  • [22] Robust noise reduction for speech and audio signals
    Godsill, SJ
    Rayner, PJW
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 625 - 628
  • [23] Classification of non-speech acoustic signals using structure models
    Tschöpe, C
    Hentschel, D
    Wolff, M
    Eichner, M
    Hoffmann, R
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 653 - 656
  • [24] Representing speech through musical notation
    Kassler, JC
    JOURNAL OF MUSICOLOGICAL RESEARCH, 2005, 24 (3-4) : 227 - 239
  • [25] Speech, Nonspeech Audio, and Visual Interruptions of a Tracking Task: A Replication and Extension of Nees and Sampsell (2021)
    Nees, Michael A.
    Liu, Claire
    Bogan, Krista
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2024, 72 (05): : 309 - 316
  • [26] Classification of Speech Signals through Ant Based Clustering of Time Series
    Pancerz, Krzysztof
    Lewicki, Arkadiusz
    Tadeusiewicz, Ryszard
    Szkola, Jaroslaw
    COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT I, 2012, 7653 : 335 - 343
  • [27] AUDIO CLASSIFICATION OF MUSIC/SPEECH MIXED SIGNALS USING SINUSOIDAL MODELING WITH SVM AND NEURAL NETWORK APPROACH
    Mowlaee, Pejman
    Sayadiyan, Abolghasem
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2013, 22 (02)
  • [28] Speech/Music Classification of Short Audio Segments
    Hirvonen, Toni
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 135 - 138
  • [29] Background Sound Classification in Speech Audio Segments
    Singh, Janvijay
    Joshi, Raviraj
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [30] Classification of audio signals using SVM and RBFNN
    Dhanalakshmi, P.
    Palanivel, S.
    Ramalingam, V.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6069 - 6075