Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

被引：4

作者：

Shi, Hao ^{[1
]}

Mimura, Masato ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Speech enhancement; robust automatic speech recognition (ASR); time-frequency hybrid model; spectral information refining; FRAMEWORK;

D O I：

10.1109/TASLP.2024.3407511

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.

引用

页码：3049 / 3060

页数：12

共 50 条

[41] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Du, Zhihao
Han, Jiqing
Zhang, Xueliang
INTERSPEECH 2020, 2020, : 309 - 313
[42] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
Yilmaz, Emre
Baby, Deepak
Van Hannne, Hugo
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
[43] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
Wang, Haikun
Ye, Zhongfu
Chen, Jingdong
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
[44] Assessment of signal subspace based speech enhancement for noise robust speech recognition
Hermus, K
Wambacq, P
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 945 - 948
[45] Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment
Beh, Jounghoon
Baran, Robert H.
Ko, Hanseok
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2006, 52 (02) : 583 - 589
[46] Combining speech enhancement with feature post-processing for robust speech recognition
Lei, Jianjun
Guo, Jun
Liu, Gang
Wang, Jian
Nie, Xiangfei
Yang, Zhen
INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
[47] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
Zhao, Shujie
Yang, Yan
Cohen, Israel
Zhang, Lijun
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
[48] Combination of GMM-based speech estimation method and temporal domain SVD-based speech enhancement for noise robust speech recognition
Faculty of Science and Technology, Ryukoku University, Otsu, 520-2194, Japan
不详
不详
Syst Comput Jpn, 2007, 3 (23-38):
[49] A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition
Badiezadegan, Shirin
Rose, Richard
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 484 - 487
[50] Using Deep Speech Recognition to Evaluate Speech Enhancement Methods
Siddiqui, Shamoon
Rasool, Ghulam
Ramachandran, Ravi P.
Bouaynaya, Nidhal C.
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →