Front-End Feature Compensation for Noise Robust Speech Emotion Recognition

被引：1

作者：

Pandharipande, Meghna ^{[1
]}

Chakraborty, Rupayan ^{[1
]}

Panda, Ashish ^{[1
]}

Das, Biswajit ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] TCS Res & Innovat Mumbai, Yantra Pk, Thana 400601, Maharashtra, India

来源：

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年

关键词：

Emotion recognition; Noisy speech; Feature compensation; Auditory masking; Vector Taylor Series;

D O I：

10.23919/eusipco.2019.8902981

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Robust feature compensation and selection are important aspects of noisy speech emotion recognition (SER) task, especially in mismatched condition, when the models are trained on clean speech and tested in the noisy scenarios. Here we propose the use of front-end feature compensation techniques based on Vector Taylor Series (VTS) expansion and VTS with auditory masking (VTS-AM) to improve the performance of SER systems. On top of VTS and VTS-AM, we compare the performances of log-compression and root-compression to the mel-filter-bank energies. Further, we demonstrate the benefit of feature selection applied to the non-MFCC high-level descriptors in conjunction with VTS, VTS-AM and root compression. The system performance is compared with popular Non-negative Matrix Factorization (NMF) based enhancement and energy based voice activity detector (VAD) technique, which discards silence or noisy frames in the spoken utterances. To demonstrate the efficacy of our proposed techniques, extensive experiments are conducted on 2 standard datasets (EmoDB and IEMOCAP), contaminated with 5 types of noise (Babble, F-16, Factory, Volvo, and HF-channel) from the Noisex-92 noise database at 5 SNR levels (0dB, 5dB, 10dB, 15dB and 20dB).

引用

页数：5

共 50 条

[41] A Front-End Technique for Automatic Noisy Speech Recognition
Naing, Hay Mar Soe
Hidayat, Risanuri
Hartanto, Rudy
Miyanaga, Yoshikazu
PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 49 - 54
[42] JOINT TRAINING OF FRONT-END AND BACK-END DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Gao, Tian
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4375 - 4379
[43] Speech Separation with EMD as Front-End for Noise Robust Co-Channel Speaker Identification
Kumar, Prasanna M. K.
Kumaraswamy, R.
2016 INTERNATIONAL CONFERENCE ON CIRCUITS, CONTROLS, COMMUNICATIONS AND COMPUTING (I4C), 2016,
[44] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
Guan, Yong
Li, Peng
Liu, Wen-Ju
Xu, Bo
Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
[45] Recognizing voice aver IP:: A robust front-end for speech recognition on the World Wide Web
Peláez-Moreno, C
Gallardo-Antolín, A
Díaz-De-María, F
IEEE TRANSACTIONS ON MULTIMEDIA, 2001, 3 (02) : 209 - 218
[46] A New Subband-Weighted MVDR-Based Front-End for Robust Speech Recognition
Seyedin, Sanaz
Ahadi, Seyed Mohammad
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (08): : 2252 - 2261
[47] Robust automatic speech recognition using a multi-channel signal separation front-end
Yen, KC
Zhao, YX
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
[48] Speech Feature Compensation Based on Pseudo Stereo Codebooks for Robust Speech Recognition in Additive Noise Environments
Hsieh, Tsung-hsueh
Hung, Jeih-weih
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2400 - 2403
[49] Noise reduction and echo cancellation front-end for speech codecs
Basbug, F
Swaminathan, K
Nandkumar, S
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (01): : 1 - 13
[50] Residual noise compensation for robust speech recognition in nonstationary noise
Yao, KS
Shi, BE
Fung, P
Cao, ZG
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1125 - 1128

← 1 2 3 4 5 →