Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

被引:4
|
作者
Shi, Hao [1 ]
Mimura, Masato [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
Speech enhancement; robust automatic speech recognition (ASR); time-frequency hybrid model; spectral information refining; FRAMEWORK;
D O I
10.1109/TASLP.2024.3407511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.
引用
收藏
页码:3049 / 3060
页数:12
相关论文
共 50 条
  • [41] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 309 - 313
  • [42] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [43] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
    Wang, Haikun
    Ye, Zhongfu
    Chen, Jingdong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
  • [44] Assessment of signal subspace based speech enhancement for noise robust speech recognition
    Hermus, K
    Wambacq, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948
  • [45] Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment
    Beh, Jounghoon
    Baran, Robert H.
    Ko, Hanseok
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2006, 52 (02) : 583 - 589
  • [46] Combining speech enhancement with feature post-processing for robust speech recognition
    Lei, Jianjun
    Guo, Jun
    Liu, Gang
    Wang, Jian
    Nie, Xiangfei
    Yang, Zhen
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
  • [47] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
  • [48] Combination of GMM-based speech estimation method and temporal domain SVD-based speech enhancement for noise robust speech recognition
    Faculty of Science and Technology, Ryukoku University, Otsu, 520-2194, Japan
    不详
    不详
    Syst Comput Jpn, 2007, 3 (23-38):
  • [49] A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition
    Badiezadegan, Shirin
    Rose, Richard
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 484 - 487
  • [50] Using Deep Speech Recognition to Evaluate Speech Enhancement Methods
    Siddiqui, Shamoon
    Rasool, Ghulam
    Ramachandran, Ravi P.
    Bouaynaya, Nidhal C.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,