An Investigation of Spectral Restoration Algorithms for Deep Neural Networks based Noise Robust Speech Recognition

被引:0
|
作者
Li, Bo [1 ]
Tsao, Yu [2 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Sch Comp Comp 1, Singapore, Singapore
[2] Acad Sinica, Res Ctr Informat Technol Innovat CITI, Taipei, Taiwan
关键词
speech enhancement; spectral restoration; deep neural networks; ENHANCEMENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Neural Networks (DNNs) are becoming widely accepted in automatic speech recognition (ASR) systems. The deep structured nonlinear processing greatly improves the model's generalization capability, but the performance under adverse environments is still unsatisfactory. In the literature, there have been many techniques successfully developed to improve Gaussian mixture models' robustness. Investigating the effectiveness of these techniques for the DNN is an important step to thoroughly understand its superiority, pinpoint its limitations and most importantly to further improve it towards the ultimate human-level robustness. In this paper, we investigate the effectiveness of speech enhancement using spectral restoration algorithms for DNNs. Four approaches are evaluated, namely minimum mean-square error spectral estimator (MMSE), maximum likelihood spectral amplitude estimator (MLSA), maximum a posteriori spectral amplitude estimator (MAPA), and generalized maximum a posteriori spectral amplitude algorithm (GMAPA). The preliminary experimental results on the Aurora 2 speech database show that with multi-condition training data the DNN itself is capable of learning robust representations. However, if only clean data is available, the MLSA algorithm is the best spectral restoration training method for DNNs.
引用
收藏
页码:3001 / +
页数:2
相关论文
共 50 条
  • [1] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
    Seltzer, Michael L.
    Yu, Dong
    Wang, Yongqiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
  • [2] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
    Li, Bo
    Sim, Khe Chai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
  • [3] Factored deep convolutional neural networks for noise robust speech recognition
    Fujimoto, Masakiyo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3837 - 3841
  • [4] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
    Qian, Yanmin
    Bi, Mengxiao
    Tan, Tian
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2263 - 2276
  • [5] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
  • [6] Investigation of Deep Neural Networks for Robust Recognition of Nonlinearly Distorted Speech
    Seps, Ladislav
    Malek, Jiri
    Cerva, Petr
    Nouza, Jan
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 363 - 367
  • [7] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [8] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Weng, Chao
    Yu, Dong
    Watanabe, Shinji
    Juang, Biing-Hwang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Deep bidirectional neural networks for robust speech recognition under heavy background noise
    Koya, Jeevan Reddy
    Rao, S. P. Venu Madhava
    MATERIALS TODAY-PROCEEDINGS, 2021, 46 : 4117 - 4121
  • [10] Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition
    Han, Kun
    He, Yanzhang
    Bagchi, Deblin
    Fosler-Lussier, Eric
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2484 - 2488