Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

被引:12
|
作者
Bashirpour, Meysam [1 ]
Geravanchizadeh, Masoud [1 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Tabriz 5166615813, Iran
关键词
Emotional speech recognition; Binaural model; Emotional auditory mask; Classification of emotional states; Kaldi speech recognition system; Noise robustness; INTELLIGIBILITY; FEATURES; DATABASE;
D O I
10.1186/s13636-018-0133-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of automatic speech recognition systems degrades in the presence of emotional states and in adverse environments (e.g., noisy conditions). This greatly limits the deployment of speech recognition application in realistic environments. Previous studies in the emotion-affected speech recognition field focus on improving emotional speech recognition using clean speech data recorded in a quiet environment (i.e., controlled studio settings). The goal of this research is to increase the robustness of speech recognition systems for emotional speech in noisy conditions. The proposed binaural emotional speech recognition system is based on the analysis of binaural input signal and an estimated emotional auditory mask corresponding to the recognized emotion. Whereas the binaural signal analyzer has the task of segregating speech from noise and constructing speech mask in a noisy environment, the estimated emotional mask identifies and removes the most emotionally affected spectra-temporal regions of the segregated target speech. In other words, our proposed system combines the two estimated masks (binary mask and emotion-specific mask) of noise and emotion, as a way to decrease the word error rate for noisy emotional speech. The performance of the proposed binaural system is evaluated in clean neutral train/noisy emotional test scenarios for different noise types, signal-to-noise ratios, and spatial configurations of sources. Speech utterances of the Persian emotional speech database are used for the experimental purposes. Simulation results show that the proposed system achieves higher performance, as compared with automatic speech recognition systems chosen as baseline trained with neutral utterances.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
    Bou-Ghazale, SE
    Assaleh, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3808 - 3811
  • [22] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [23] A Binaural Deep Neural Networks Parameter Mask for the Robust Automatic Speech Recognition System
    Jiang, Yi
    Liu, Runsheng
    2016 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2016, : 352 - 356
  • [24] Emotional Speech Recognition Based on the Committee of Classifiers
    Kaminska, Dorota
    ENTROPY, 2019, 21 (10)
  • [25] Emotion-detecting based model selection for emotional speech recognition
    Pan, Y. C.
    Xu, M. X.
    Liu, L. Q.
    Jia, P. F.
    2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 2169 - +
  • [26] Blind source extraction for robust speech recognition in multisource noisy environments
    Nesta, Francesco
    Matassoni, Marco
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
  • [27] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [28] Speech emotion recognition based on a modified brain emotional learning model
    Motamed, Sara
    Setayeshi, Saeed
    Rabiee, Azam
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, 2017, 19 : 32 - 38
  • [29] Speech Emotion Recognition Based on EMD in Noisy Environments
    Chu, Yunyun
    Xiong, Weihua
    Chen, Wei
    ADVANCES IN CIVIL ENGINEERING AND BUILDING MATERIALS III, 2014, 831 : 460 - 464
  • [30] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
    Visser, E
    Otsuka, M
    Lee, TW
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407