A PROGRESSIVE LEARNING APPROACH TO ADAPTIVE NOISE AND SPEECH ESTIMATION FOR SPEECH ENHANCEMENT AND NOISY SPEECH RECOGNITION

被引:9
|
作者
Nian, Zhaoxu [1 ]
Tu, Yan-Hui [1 ]
Du, Jun [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
国家重点研发计划;
关键词
Speech recognition; speech enhancement; progressive learning; improved minima controlled recursive averaging; adaptive noise and speech estimation;
D O I
10.1109/ICASSP39728.2021.9413395
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a progressive learning-based adaptive noise and speech estimation (PL-ANSE) method for speech preprocessing in noisy speech recognition, leveraging upon a frame-level noise tracking capability of improved minima controlled recursive averaging (IMCRA) and an utterance-level deep progressive learning of nonlinear interactions between speech and noise. First, a bi-directional long short-term memory model is adopted at each network layer to learn progressive ratio masks (PRMs) as targets with progressively increasing signal-to-noise ratios. Then, the estimated PRMs at the utterance level are combined within a conventional speech enhancement algorithm at the frame level for speech enhancement. Finally, the enhanced speech based on multi-level information fusion is directly fed into a speech recognition system to improve the recognition performance. Experiments show that our proposed approach can achieve a relative word error rate (WER) reduction of 22.1% when compared to results attained with unprocessed noisy speech (from 23.84% to 18.57%) on the CHiME-4 single-channel real test data.
引用
收藏
页码:6913 / 6917
页数:5
相关论文
共 50 条
  • [31] Noise adaptive speech recognition based on sequential noise parameter estimation
    Yao, KS
    Paliwal, KK
    Nakamura, S
    SPEECH COMMUNICATION, 2004, 42 (01) : 5 - 23
  • [32] Adaptive noise spectral estimation for spectral subtraction speech enhancement
    Hu, H. T.
    Yu, C.
    IET SIGNAL PROCESSING, 2007, 1 (03) : 156 - 163
  • [33] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    Shen, Yih-Liang
    Huang, Chao-Yuan
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsin-Min
    Chi, Tai-Shih
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
  • [34] Universal compensation - An approach to noisy speech recognition assuming no knowledge of noise
    Ming, J
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 961 - 964
  • [35] A COLLABORATIVE SPEECH ENHANCEMENT APPROACH FOR SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Kocsis, Otilia
    Ganchev, Todor
    Fakotakis, Nikos
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 1254 - 1259
  • [36] Speech Enhancement for Automatic Speech Recognition Using Complex Gaussian Mixture Priors for Noise and Speech
    Astudillo, Ramon F.
    Hoffmann, Eugen
    Mandelartz, Philipp
    Orglmeister, Reinhold
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2010, 5933 : 60 - 67
  • [37] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
    Visser, E
    Otsuka, M
    Lee, TW
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407
  • [38] APPLICATION OF MULTILAYER PERCEPTRON IN ESTIMATING SPEECH NOISE CHARACTERISTICS FOR SPEECH RECOGNITION IN NOISY ENVIRONMENT
    LEE, HS
    TSOI, AC
    SPEECH COMMUNICATION, 1995, 17 (1-2) : 59 - 76
  • [39] An RNN-based noise estimation and likelihood compensation for noisy speech recognition
    Hong, WT
    Chen, SH
    NEURAL NETWORKS FOR SIGNAL PROCESSING VI, 1996, : 293 - 301
  • [40] Noise robust automatic speech recognition with adaptive quantile based noise estimation and speech band emphasizing filter bank
    Bonde, CS
    Graversen, C
    Gregersen, AG
    Ngo, KH
    Normark, K
    Purup, M
    Thorsen, T
    Lindberg, B
    NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 291 - 302