A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引:0
|
作者
Wang, Heming [1 ]
Pandey, Ashutosh [1 ]
Wang, Deliang [2 ]
机构
[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA
[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA
来源
关键词
Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;
D O I
10.1016/j.csl.2024.101677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Intelligibility Enhancement of Casual Speech for Reverberant Environments inspired by Clear Speech Properties
    Koutsogiannaki, Maria
    Petkov, Petko N.
    Stylianou, Yannis
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 65 - 69
  • [42] Speech recognition based on HMM decomposition and composition method with a microphone array in noisy reverberant environments
    Miki, K
    Nishiura, T
    Nakamura, S
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (09): : 13 - 22
  • [43] Design of the Wiener gain in noisy and reverberant environments
    Xiang, Qian
    Chen, Jingdong
    Benesty, Jacob
    Lei, Tao
    Pan, Chao
    APPLIED ACOUSTICS, 2025, 231
  • [44] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
    Jin, Zhaozhang
    Wang, DeLiang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
  • [45] A MULTIPITCH TRACKING ALGORITHM FOR NOISY AND REVERBERANT SPEECH
    Jin, Zhaozhang
    Wang, DeLiang
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4218 - 4221
  • [46] A comparative study on time delay estimation in reverberant and noisy environments
    Chen, JD
    Huang, YT
    Benesty, J
    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 21 - 24
  • [47] A DNN Parameter Mask for the Binaural Reverberant Speech Segregation
    Jiang, Yi
    Li, Wei
    Zu, Yuanyuan
    Liu, Runsheng
    Ma, Chao
    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 959 - 963
  • [48] Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
    Zhao, Yan
    Wang, Zhong-Qiu
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 53 - 62
  • [49] ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS
    Xiong, Feifei
    Goetze, Stefan
    Meyer, Bernd T.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5250 - 5254
  • [50] Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments
    Yamaoka, Kouei
    Makino, Shoji
    Ono, Nobutaka
    Yamada, Takeshi
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2324 - 2328