A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引:72
|
作者
Wu, Bo [1 ]
Li, Kehuang [2 ]
Yang, Minglei [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;
D O I
10.1109/TASLP.2016.2623559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.
引用
收藏
页码:102 / 111
页数:10
相关论文
共 50 条
  • [31] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585
  • [32] IMPROVING SPEECH RECOGNITION IN REVERBERATION USING A ROOM-AWARE DEEP NEURAL NETWORK AND MULTI-TASK LEARNING
    Giri, Ritwik
    Seltzer, Michael L.
    Droppo, Jasha
    Yu, Dong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5014 - 5018
  • [33] A TRAINING FRAMEWORK FOR STEREO-AWARE SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Toloosham, Bahareh
    Koishida, Kazuhito
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6962 - 6966
  • [34] State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition
    Zhou, Pan
    Jiang, Hui
    Dai, Li-Rong
    Hu, Yu
    Liu, Qing-Feng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 631 - 642
  • [35] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351
  • [36] State-clustering based multiple deep neural networks modeling approach for speech recognition
    National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China, Hefei
    230026, China
    不详
    ON
    M3J1P3, Canada
    IEEE ACM Trans. Audio Speech Lang. Process., 4 (631-642):
  • [37] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [38] Binaural reverberant Speech separation based on deep neural networks
    Zhang, Xueliang
    Wang, DeLiang
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
  • [39] An Experimental Study on Speech Enhancement Based on Deep Neural Networks
    Xu, Yong
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (01) : 65 - 68
  • [40] Helium Speech Correction Algorithm Based on Deep Neural Networks
    Li, Dongmei
    Zhang, Shibing
    Guo, Lili
    Chen, Yonghong
    2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 99 - 103