Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

被引:12
|
作者
Bashirpour, Meysam [1 ]
Geravanchizadeh, Masoud [1 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Tabriz 5166615813, Iran
关键词
Emotional speech recognition; Binaural model; Emotional auditory mask; Classification of emotional states; Kaldi speech recognition system; Noise robustness; INTELLIGIBILITY; FEATURES; DATABASE;
D O I
10.1186/s13636-018-0133-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of automatic speech recognition systems degrades in the presence of emotional states and in adverse environments (e.g., noisy conditions). This greatly limits the deployment of speech recognition application in realistic environments. Previous studies in the emotion-affected speech recognition field focus on improving emotional speech recognition using clean speech data recorded in a quiet environment (i.e., controlled studio settings). The goal of this research is to increase the robustness of speech recognition systems for emotional speech in noisy conditions. The proposed binaural emotional speech recognition system is based on the analysis of binaural input signal and an estimated emotional auditory mask corresponding to the recognized emotion. Whereas the binaural signal analyzer has the task of segregating speech from noise and constructing speech mask in a noisy environment, the estimated emotional mask identifies and removes the most emotionally affected spectra-temporal regions of the segregated target speech. In other words, our proposed system combines the two estimated masks (binary mask and emotion-specific mask) of noise and emotion, as a way to decrease the word error rate for noisy emotional speech. The performance of the proposed binaural system is evaluated in clean neutral train/noisy emotional test scenarios for different noise types, signal-to-noise ratios, and spatial configurations of sources. Speech utterances of the Persian emotional speech database are used for the experimental purposes. Simulation results show that the proposed system achieves higher performance, as compared with automatic speech recognition systems chosen as baseline trained with neutral utterances.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Auditory-model based robust feature selection for speech recognition
    Koniaris, Christos
    Kuropatwinski, Marcin
    Kleijn, W. Bastiaan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02): : EL73 - EL79
  • [32] Prediction of emotional dimensions PAD for emotional speech recognition
    Sun Y.
    Hu Y.-X.
    Zhang X.-Y.
    Duan S.-F.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2019, 53 (10): : 2041 - 2048
  • [33] Robust Speech Detection for Noisy Environments
    Varela, Oscar
    Indra, S. A.
    San-Segundo, Ruben
    Hernandez, Luis A.
    IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2011, 26 (11) : 16 - U12
  • [34] Design of Neural Network Model for Emotional Speech Recognition
    Palo, H. K.
    Mohanty, Mihir Narayana
    Chandra, Mahesh
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 2, 2015, 325 : 291 - 300
  • [35] SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY
    GONG, YF
    SPEECH COMMUNICATION, 1995, 16 (03) : 261 - 291
  • [36] Analog auditory perception model for robust speech recognition
    Deng, YB
    Chakrabartty, S
    Cauwenberghs, G
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1705 - 1709
  • [37] Emotional speech synthesis based on DNN and PAD emotional state model
    Zhang, Weizhao
    Yang, Hongwu
    Zhi, Pengpeng
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 41 - 45
  • [38] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
    Menon, Anjali
    Kim, Chanwoo
    Kurokawa, Umpei
    Stern, Richard M.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31
  • [39] Emotional speech recognition based on modified parameter and distance of statistical model of pitch
    Department of Radio Engineering, Southeast University, Nanjing 210096, China
    Shengxue Xuebao, 2006, 1 (28-34):
  • [40] Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system
    Kingsbury, B
    Saon, G
    Mangu, L
    Padmanabhan, M
    Sarikaya, R
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 53 - 56