A PROGRESSIVE LEARNING APPROACH TO ADAPTIVE NOISE AND SPEECH ESTIMATION FOR SPEECH ENHANCEMENT AND NOISY SPEECH RECOGNITION

被引:9
|
作者
Nian, Zhaoxu [1 ]
Tu, Yan-Hui [1 ]
Du, Jun [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
国家重点研发计划;
关键词
Speech recognition; speech enhancement; progressive learning; improved minima controlled recursive averaging; adaptive noise and speech estimation;
D O I
10.1109/ICASSP39728.2021.9413395
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a progressive learning-based adaptive noise and speech estimation (PL-ANSE) method for speech preprocessing in noisy speech recognition, leveraging upon a frame-level noise tracking capability of improved minima controlled recursive averaging (IMCRA) and an utterance-level deep progressive learning of nonlinear interactions between speech and noise. First, a bi-directional long short-term memory model is adopted at each network layer to learn progressive ratio masks (PRMs) as targets with progressively increasing signal-to-noise ratios. Then, the estimated PRMs at the utterance level are combined within a conventional speech enhancement algorithm at the frame level for speech enhancement. Finally, the enhanced speech based on multi-level information fusion is directly fed into a speech recognition system to improve the recognition performance. Experiments show that our proposed approach can achieve a relative word error rate (WER) reduction of 22.1% when compared to results attained with unprocessed noisy speech (from 23.84% to 18.57%) on the CHiME-4 single-channel real test data.
引用
收藏
页码:6913 / 6917
页数:5
相关论文
共 50 条
  • [41] Experiments with an extended adaptive SVD enhancement scheme for speech recognition in noise
    Uhl, C
    Lieb, M
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 281 - 284
  • [42] An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks
    Jolad B.
    Khanai R.
    International Journal of Speech Technology, 2023, 26 (02) : 287 - 305
  • [43] On the effectiveness of speech enhancement to a proposed speech recognition process that applied to noisy isolated-word recognition
    Liu, Lih-Cherng
    Lu, Ching-Ta
    Tsai, Ho-Hsuan
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3310 - +
  • [44] Adaptive Threshold for Speech Enhancement in Nonstationary Noisy Environments
    Lee, Soo-Jeong
    Kim, Sun-Hyob
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (07): : 386 - 393
  • [45] Combining DCT and Adaptive KLT for Noisy Speech Enhancement
    Ou, Shifeng
    Zhao, Xiaohui
    Dong, Jing
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 2857 - 2860
  • [46] COMBINED ENHANCEMENT AND ADAPTIVE TRANSFORM CODING OF NOISY SPEECH
    EPHRAIM, Y
    MALAH, D
    IEE PROCEEDINGS-F RADAR AND SIGNAL PROCESSING, 1986, 133 (01) : 81 - 86
  • [47] New adaptive structures for speech enhancement in noisy environments
    Martins, CR
    Piedade, MS
    42ND MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1999, : 241 - 244
  • [48] An approach for pitch estimation from noisy speech
    Shahnaz, C.
    Zhu, W. -P.
    Ahmad, M. O.
    2007 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, 2007, : 1590 - 1593
  • [49] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [50] Assessment of signal subspace based speech enhancement for noise robust speech recognition
    Hermus, K
    Wambacq, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948