An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition

被引：0

作者：

Li, Bo ^{[1
]}

Tsao, Yu ^{[2
]}

Sim, Khe Chai ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp Comp 1, Singapore, Singapore

[2] Acad Sinica, Res Ctr Informat Technol Innovat CITI, Taipei, Taiwan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech enhancement; spectral restoration; deep neural networks; ENHANCEMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep Neural Networks (DNNs) are becoming widely accepted in automatic speech recognition (ASR) systems. The deep structured nonlinear processing greatly improves the model's generalization capability, but the performance under adverse environments is still unsatisfactory. In the literature, there have been many techniques successfully developed to improve Gaussian mixture models' robustness. Investigating the effectiveness of these techniques for the DNN is an important step to thoroughly understand its superiority, pinpoint its limitations and most importantly to further improve it towards the ultimate human-level robustness. In this paper, we investigate the effectiveness of speech enhancement using spectral restoration algorithms for DNNs. Four approaches are evaluated, namely minimum mean-square error spectral estimator (MMSE), maximum likelihood spectral amplitude estimator (MLSA), maximum a posteriori spectral amplitude estimator (MAPA), and generalized maximum a posteriori spectral amplitude algorithm (GMAPA). The preliminary experimental results on the Aurora 2 speech database show that with multi-condition training data the DNN itself is capable of learning robust representations. However, if only clean data is available, the MLSA algorithm is the best spectral restoration training method for DNNs.

引用

页码：3001 / +

页数：2

共 50 条

[1] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
Seltzer, Michael L.
Yu, Dong
Wang, Yongqiang
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
[2] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
Li, Bo
Sim, Khe Chai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
[3] Factored deep convolutional neural networks for noise robust speech recognition
Fujimoto, Masakiyo
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3837 - 3841
[4] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
Qian, Yanmin
Bi, Mengxiao
Tan, Tian
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2263 - 2276
[5] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Tu, Yan-Hui
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
[6] Investigation of Deep Neural Networks for Robust Recognition of Nonlinearly Distorted Speech
Seps, Ladislav
Malek, Jiri
Cerva, Petr
Nouza, Jan
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 363 - 367
[7] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
Du, Jun
Wang, Qing
Gao, Tian
Xu, Yong
Dai, Lirong
Lee, Chin-Hui
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
[8] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Weng, Chao
Yu, Dong
Watanabe, Shinji
Juang, Biing-Hwang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] Deep bidirectional neural networks for robust speech recognition under heavy background noise
Koya, Jeevan Reddy
Rao, S. P. Venu Madhava
MATERIALS TODAY-PROCEEDINGS, 2021, 46 : 4117 - 4121
[10] Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition
Han, Kun
He, Yanzhang
Bagchi, Deblin
Fosler-Lussier, Eric
Wang, DeLiang
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2484 - 2488

← 1 2 3 4 5 →