Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments

被引:12
|
作者
Bashirpour, Meysam [1 ]
Geravanchizadeh, Masoud [1 ]
机构
[1] Univ Tabriz, Fac Elect & Comp Engn, Tabriz 5166615813, Iran
关键词
Emotional speech recognition; Binaural model; Emotional auditory mask; Classification of emotional states; Kaldi speech recognition system; Noise robustness; INTELLIGIBILITY; FEATURES; DATABASE;
D O I
10.1186/s13636-018-0133-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The performance of automatic speech recognition systems degrades in the presence of emotional states and in adverse environments (e.g., noisy conditions). This greatly limits the deployment of speech recognition application in realistic environments. Previous studies in the emotion-affected speech recognition field focus on improving emotional speech recognition using clean speech data recorded in a quiet environment (i.e., controlled studio settings). The goal of this research is to increase the robustness of speech recognition systems for emotional speech in noisy conditions. The proposed binaural emotional speech recognition system is based on the analysis of binaural input signal and an estimated emotional auditory mask corresponding to the recognized emotion. Whereas the binaural signal analyzer has the task of segregating speech from noise and constructing speech mask in a noisy environment, the estimated emotional mask identifies and removes the most emotionally affected spectra-temporal regions of the segregated target speech. In other words, our proposed system combines the two estimated masks (binary mask and emotion-specific mask) of noise and emotion, as a way to decrease the word error rate for noisy emotional speech. The performance of the proposed binaural system is evaluated in clean neutral train/noisy emotional test scenarios for different noise types, signal-to-noise ratios, and spatial configurations of sources. Speech utterances of the Persian emotional speech database are used for the experimental purposes. Simulation results show that the proposed system achieves higher performance, as compared with automatic speech recognition systems chosen as baseline trained with neutral utterances.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Mask estimation for missing data speech recognition based on statistics of binaural interaction
    Harding, S
    Barker, J
    Brown, GJ
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 58 - 67
  • [42] Robust Arabic speech recognition in noisy environments using prosodic features and formant
    Amrous, Anissa
    Debyeche, Mohamed
    Amrouche, Abderrahman
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) : 351 - 359
  • [43] EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR
    Chen Yanxiang Xie Jian (Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine
    Journal of Electronics(China), 2012, (Z2) : 339 - 344
  • [44] Emotional Speech Recognition Method Based on Word Transcription
    Bekmanova, Gulmira
    Yergesh, Banu
    Sharipbay, Altynbek
    Mukanova, Assel
    SENSORS, 2022, 22 (05)
  • [45] EMOTIONAL SPEECH RECOGNITION BASED ON SVM WITH GMM SUPERVECTOR
    Chen Yanxiang Xie Jian Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine School of Computer Science Information Hefei University of Technology Hefei China
    JournalofElectronics(China), 2012, 29(Z2) (China) : 339 - 344
  • [46] Emotional Speech Recognition Based on Lip-Reading
    Ryumina, Elena
    Ivanko, Denis
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 616 - 625
  • [47] Mandarin emotional speech recognition based on SVM and NN
    Pao, Tsang-Long
    Chen, Yu-Te
    Yeh, Jun-Heng
    Li, Pei-Jia
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1096 - +
  • [48] Speech enhancement applied to speech recognition in noisy environments
    Xu, Y.F., 2001, Press of Tsinghua University (41):
  • [49] AN AUDITORY-BASED FEATURE FOR ROBUST SPEECH RECOGNITION
    Shao, Yang
    Jin, Zhaozhang
    Wang, DeLiang
    Srinivasan, Soundararajan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4625 - +
  • [50] Influence of Emotional Speech on Continuous Speech Recognition
    Zgank, Andrej
    Maucec, Mirjam Sepesy
    13TH INTERNATIONAL CONFERENCE ON ELEKTRO (ELEKTRO 2020), 2020,