Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

被引:4
|
作者
Shi, Hao [1 ]
Mimura, Masato [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
关键词
Speech enhancement; robust automatic speech recognition (ASR); time-frequency hybrid model; spectral information refining; FRAMEWORK;
D O I
10.1109/TASLP.2024.3407511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.
引用
收藏
页码:3049 / 3060
页数:12
相关论文
共 50 条
  • [1] Robust speech recognition using the modulation spectrogram
    Kingsbury, BED
    Morgan, N
    Greenberg, S
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 117 - 132
  • [2] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [3] Robust recognition of noisy speech using speech enhancement
    Xu, YF
    Zhang, JJ
    Yao, KS
    Cao, ZG
    Ma, ZX
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
  • [4] Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
    Yang, Yufeng
    Pandey, Ashutosh
    Wang, DeLiang
    INTERSPEECH 2023, 2023, : 4913 - 4917
  • [5] SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Ganchev, Todor
    Kocsis, Otilia
    Fakotakis, Nikos
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2010, 19 (02) : 159 - 173
  • [6] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [7] Spectral-domain speech enhancement for speech recognition
    You, Chang Huai
    Ma, Bin
    SPEECH COMMUNICATION, 2017, 94 : 30 - 41
  • [8] Robust speech recognition using singular value decomposition based speech enhancement
    Lilly, BT
    Paliwal, KK
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 257 - 260
  • [9] Real Time Speech Enhancement in the Waveform Domain
    Defossez, Alexandre
    Synnaeve, Gabriel
    Adi, Yossi
    INTERSPEECH 2020, 2020, : 3291 - 3295
  • [10] JOINT ENCODING OF THE WAVEFORM AND SPEECH RECOGNITION FEATURES USING A TRANSFORM CODEC
    Fan, Xing
    Seltzer, Michael L.
    Droppo, Jasha
    Malvar, Henrique S.
    Acero, Alex
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5148 - 5151