Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引：38

作者：

Harding, S ^{[1
]}

Barker, J ^{[1
]}

Brown, GJ ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期

关键词：

automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;

D O I：

10.1109/TSA.2005.860354

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.

引用

页码：58 / 67

页数：10

共 50 条

[21] A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: Application to noise-robust speech recognition
López-Espejo, I.
González, José A.
Gómez, Ángel M.
Peinado, Antonio M.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8854 : 119 - 128
[22] A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
Lopez-Espejo, Ivan
Gonzalez, Jose A.
Gomez, Angel M.
Peinado, Antonio M.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 119 - 128
[23] Mask Estimation Employing Posterior-Based Representative Mean for Missing-Feature Speech Recognition with Time-Varying Background Noise
Kim, Wooil
Hansen, John H. L.
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 194 - 198
[24] Model based Estimation of STP parameters for Binaural Speech Enhancement
Kavalekalam, Mathew Shaji
Nielsen, Jesper Kjaer
Christensen, Mads Graesboll
Boldt, Jesper
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2479 - 2483
[25] A study of speech recognition based on fuzzy statistics
Li, SL
Hou, CH
ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 757 - 760
[26] An iterative mask estimation approach to deep learning based multi-channel speech recognition
Tu, Yan-Hui
Du, Jun
Sun, Lei
Ma, Feng
Wang, Hai-Kun
Chen, Jing-Dong
Lee, Chin-Hui
SPEECH COMMUNICATION, 2019, 106 : 31 - 43
[27] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
Takahashi, Toru
Yamamoto, Shun'ichi
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
[28] Efficient data selection for speech recognition based on prior confidence estimation
Kobashikawa, Satoshi
Asami, Taichi
Yamaguchi, Yoshikazu
Masataki, Hirokazu
Takahashi, Satoshi
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (04) : 151 - 153
[29] Feature classification criterion for missing features mask estimation in robust speaker recognition
Ribas Gonzalez, Dayana
Calvo de Lara, Jose Ramon
SIGNAL IMAGE AND VIDEO PROCESSING, 2014, 8 (02) : 365 - 375
[30] Feature classification criterion for missing features mask estimation in robust speaker recognition
Dayana Ribas González
José Ramón Calvo de Lara
Signal, Image and Video Processing, 2014, 8 : 365 - 375

← 1 2 3 4 5 →