A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引：72

作者：

Wu, Bo ^{[1
]}

Li, Kehuang ^{[2
]}

Yang, Minglei ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;

D O I：

10.1109/TASLP.2016.2623559

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.

引用

页码：102 / 111

页数：10

共 50 条

[41] Speech Separation of A Target Speaker Based on Deep Neural Networks
Du Jun
Tu Yanhui
Xu Yong
Dai Lirong
Chin-Hui, Lee
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477
[42] SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS WITH SKIP CONNECTIONS
Tu, Ming
Zhang, Xianxian
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5565 - 5569
[43] Target Speech Signal Enhancement Based on Deep Neural Networks
Zhang, Xin
Wang, MingJiang
Xuan, XiaoGuang
Sun, FengJiao
2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 241 - 245
[44] Acceleration Strategies for Speech Recognition based on Deep Neural Networks
Tian, Chao
Liu, Jia
Peng, Zhaomeng
MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5181 - 5185
[45] The Representation of Speech in Deep Neural Networks
Scharenborg, Odette
van der Gouw, Nikki
Larson, Martha
Marchiori, Elena
MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205
[46] A UNIFIED DEEP MODELING APPROACH TO SIMULTANEOUS SPEECH DEREVERBERATION AND RECOGNITION FOR THE REVERB CHALLENGE
Wu, Bo
Li, Kehuang
Huang, Zhen
Siniscalchi, Sabato Marco
Yang, Minglei
Lee, Chin-Hui
2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 36 - 40
[47] Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition
Purushothaman, Anurenjan
Sreeram, Anirudh
Kumar, Rohit
Ganapathy, Sriram
INTERSPEECH 2020, 2020, : 1688 - 1692
[48] Impact of reverberation through deep neural networks on adversarial perturbations
Cohendet, Romain
Solinas, Miguel
Bernhard, Remi
Reyboz, Marina
Moellic, Pierre-Alain
Bourrier, Yannick
Mermillod, Martial
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 840 - 846
[49] Comparison of CNN-based Speech Dereverberation using Neural Vocoder
Chun, Chanjun
Jeon, Kwang Myung
Leem, Chaejun
Lee, Bumshik
Choi, Wooyeol
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 251 - 254
[50] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Su, Jiaqi
Jin, Zeyu
Finkelstein, Adam
INTERSPEECH 2020, 2020, : 4506 - 4510

← 1 2 3 4 5 →