Front-End Feature Compensation for Noise Robust Speech Emotion Recognition

被引:1
|
作者
Pandharipande, Meghna [1 ]
Chakraborty, Rupayan [1 ]
Panda, Ashish [1 ]
Das, Biswajit [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] TCS Res & Innovat Mumbai, Yantra Pk, Thana 400601, Maharashtra, India
关键词
Emotion recognition; Noisy speech; Feature compensation; Auditory masking; Vector Taylor Series;
D O I
10.23919/eusipco.2019.8902981
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Robust feature compensation and selection are important aspects of noisy speech emotion recognition (SER) task, especially in mismatched condition, when the models are trained on clean speech and tested in the noisy scenarios. Here we propose the use of front-end feature compensation techniques based on Vector Taylor Series (VTS) expansion and VTS with auditory masking (VTS-AM) to improve the performance of SER systems. On top of VTS and VTS-AM, we compare the performances of log-compression and root-compression to the mel-filter-bank energies. Further, we demonstrate the benefit of feature selection applied to the non-MFCC high-level descriptors in conjunction with VTS, VTS-AM and root compression. The system performance is compared with popular Non-negative Matrix Factorization (NMF) based enhancement and energy based voice activity detector (VAD) technique, which discards silence or noisy frames in the spoken utterances. To demonstrate the efficacy of our proposed techniques, extensive experiments are conducted on 2 standard datasets (EmoDB and IEMOCAP), contaminated with 5 types of noise (Babble, F-16, Factory, Volvo, and HF-channel) from the Noisex-92 noise database at 5 SNR levels (0dB, 5dB, 10dB, 15dB and 20dB).
引用
收藏
页数:5
相关论文
共 50 条
  • [21] A Multichannel Noise Reduction Front-end based on psychoacoustics for robust speech recognition in highly noisy environments
    Cifani, Simone
    Principi, Emanuele
    Rocchi, Cesare
    Squartini, Stefano
    Piazza, Francesco
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 173 - 176
  • [22] ROBUST FEATURE FRONT-END FOR SPEAKER IDENTIFICATION
    Liu, Gang
    Lei, Yun
    Hansen, John H. L.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4233 - 4236
  • [23] Feature enhancement for a bitstream-based front-end in wireless speech recognition
    Kim, HK
    Cox, RV
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 241 - 244
  • [24] Feature compensation based on independent noise estimation for robust speech recognition
    Lu, Yong
    Lin, Han
    Wu, Pingping
    Chen, Yitao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [25] Feature compensation based on independent noise estimation for robust speech recognition
    Yong Lü
    Han Lin
    Pingping Wu
    Yitao Chen
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [26] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [27] Investigation into a Mel subspace based front-end processing for robust speech recognition
    Selouani, SA
    O'Shaughnessy, D
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 187 - 190
  • [28] A noise-robust front-end based on tree-structured filter-bank for speech recognition
    Kil, RM
    Kim, YI
    Lee, GH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL VI, 2000, : 81 - 86
  • [29] An efficient front-end for automatic speech recognition
    Ahadi, SM
    Sheikhzadeh, H
    Brennan, RL
    Freeman, GH
    ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 128 - 131
  • [30] Implementation of an acoustic front-end for speech recognition
    Albarello, Alain
    Breitschaedel, Robert
    Ciaramella, Alberto
    Lenormand, Eric
    Pacifici, Roberto
    Potage, Jean
    Riviere, Jean-Pierre
    Scheibel, Norbert
    Venuti, Giovanni
    CSELT Technical Reports, 1988, 16 (05): : 455 - 459