A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks

被引:72
|
作者
Wu, Bo [1 ]
Li, Kehuang [2 ]
Yang, Minglei [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian 710126, Peoples R China
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
中国国家自然科学基金;
关键词
Acoustic context; deep neural networks (DNNs); frame shift; linear output layer; mean-variance normalization; reverberation-time-aware (RTA); speech dereverberation; ALGORITHM; SUPPRESSION; PREDICTION;
D O I
10.1109/TASLP.2016.2623559
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reverberation-time-aware deep-neural-network (DNN)-based speech dereverberation framework is proposed to handle a wide range of reverberation times. There are three key steps in designing a robust system. First, in contrast to sigmoid activation and min-max normalization in state-of-the-art algorithms, a linear activation function at the output layer and global meanvariance normalization of target features are adopted to learn the complicated nonlinear mapping function from reverberant to anechoic speech and to improve the restoration of the low-frequency and intermediate-frequency contents. Next, two key design parameters, namely, frame shift size in speech framing and acoustic context window size at the DNN input, are investigated to show that RT60-dependent parameters are needed in the DNN training stage in order to optimize the system performance in diverse reverberant environments. Finally, the reverberation time is estimated to select the proper frame shift and context window sizes for feature extraction before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. Our experimental results indicate that the proposed framework outperforms the conventional DNNs without taking the reverberation time into account, while achieving a performance only slightly worse than the oracle cases with known reverberation times even for extremely weak and severe reverberant conditions. It also generalizes well to unseen room sizes, loudspeaker and microphone positions, and recorded room impulse responses.
引用
收藏
页码:102 / 111
页数:10
相关论文
共 50 条
  • [21] Robust Speech Dereverberation Based on WPE and Deep Learning
    Li, Hao
    Zhang, Xueliang
    Gao, Guanglai
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 52 - 56
  • [22] Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
    Xiao, Xiong
    Zhao, Shengkui
    Duc Hoang Ha Nguyen
    Zhong, Xionghu
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016, : 1 - 18
  • [23] Deep Learning Based Target Cancellation for Speech Dereverberation
    Wang, Zhong-Qiu
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 941 - 950
  • [24] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [25] Speech bandwidth expansion based on Deep Neural Networks
    Wang, Yingxue
    Zhao, Shenghui
    Liu, Wenbo
    Li, Ming
    Kuang, Jingming
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
  • [26] Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
    Lee, Seo-Hyun
    Lee, Young-Eun
    Kim, Soowon
    Ko, Byung-Kwan
    Kim, Jun-Young
    2024 12TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI 2024, 2024,
  • [27] SPEECH DEREVERBERATION BASED ON INTEGRATED DEEP AND ENSEMBLE LEARNING ALGORITHM
    Lee, Wei-Jen
    Wang, Syu-Siang
    Chen, Fei
    Lu, Xugang
    Chien, Shao-Yi
    Tsao, Yu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5454 - 5458
  • [28] Deep Learning-Based Amplitude Fusion for Speech Dereverberation
    Liu, Chunlei
    Wang, Longbiao
    Dang, Jianwu
    DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020
  • [29] Virtual acoustic channel expansion based on neural networks for weighted prediction error-based speech dereverberation
    Yang, Joon-Young
    Chang, Joon-Hyuk
    INTERSPEECH 2020, 2020, : 3930 - 3934
  • [30] Spatial Variability Aware Deep Neural Networks (SVANN): A General Approach
    Gupta, Jayant
    Molnar, Carl
    Xie, Yiqun
    Knight, Joe
    Shekhar, Shashi
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (06)