A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引:72
|
作者
Wu, Bo [1 ]
Li, Kehuang [2 ]
Yang, Minglei [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;
D O I
10.1109/TASLP.2016.2623559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.
引用
收藏
页码:102 / 111
页数:10
相关论文
共 50 条
  • [41] Speech Separation of A Target Speaker Based on Deep Neural Networks
    Du Jun
    Tu Yanhui
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477
  • [42] SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS WITH SKIP CONNECTIONS
    Tu, Ming
    Zhang, Xianxian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5565 - 5569
  • [43] Target Speech Signal Enhancement Based on Deep Neural Networks
    Zhang, Xin
    Wang, MingJiang
    Xuan, XiaoGuang
    Sun, FengJiao
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 241 - 245
  • [44] Acceleration Strategies for Speech Recognition based on Deep Neural Networks
    Tian, Chao
    Liu, Jia
    Peng, Zhaomeng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5181 - 5185
  • [45] The Representation of Speech in Deep Neural Networks
    Scharenborg, Odette
    van der Gouw, Nikki
    Larson, Martha
    Marchiori, Elena
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 194 - 205
  • [46] A UNIFIED DEEP MODELING APPROACH TO SIMULTANEOUS SPEECH DEREVERBERATION AND RECOGNITION FOR THE REVERB CHALLENGE
    Wu, Bo
    Li, Kehuang
    Huang, Zhen
    Siniscalchi, Sabato Marco
    Yang, Minglei
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 36 - 40
  • [47] Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Kumar, Rohit
    Ganapathy, Sriram
    INTERSPEECH 2020, 2020, : 1688 - 1692
  • [48] Impact of reverberation through deep neural networks on adversarial perturbations
    Cohendet, Romain
    Solinas, Miguel
    Bernhard, Remi
    Reyboz, Marina
    Moellic, Pierre-Alain
    Bourrier, Yannick
    Mermillod, Martial
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 840 - 846
  • [49] Comparison of CNN-based Speech Dereverberation using Neural Vocoder
    Chun, Chanjun
    Jeon, Kwang Myung
    Leem, Chaejun
    Lee, Bumshik
    Choi, Wooyeol
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 251 - 254
  • [50] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
    Su, Jiaqi
    Jin, Zeyu
    Finkelstein, Adam
    INTERSPEECH 2020, 2020, : 4506 - 4510