Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引:2
|
作者
Kothapally, Vinay [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;
D O I
10.1109/TASLP.2024.3358720
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [1] Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention
    Zhao, Yan
    Wang, DeLiang
    Xu, Buye
    Zhang, Tao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1598 - 1607
  • [2] Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 80 - 84
  • [3] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351
  • [4] Speech Dereverberation Using Fully Convolutional Networks
    Ernst, Ori
    Chazan, Shlomo E.
    Gannot, Sharon
    Goldberger, Jacob
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 390 - 394
  • [5] UTTERANCE WEIGHTED MULTI-DILATION TEMPORAL CONVOLUTIONAL NETWORKS FOR MONAURAL SPEECH DEREVERBERATION
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [6] An Overview of Monaural Speech Denoising and Dereverberation Research
    Lan T.
    Peng C.
    Li S.
    Ye W.
    Li M.
    Hui G.
    Lü Y.
    Qian Y.
    Liu Q.
    Liu, Qiao (qliu@uestc.edu.cn), 1600, Science Press (57): : 928 - 953
  • [7] SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking
    Kothapally, Vinay
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1600 - 1613
  • [8] A Deep Proximal-Unfolding Method for Monaural Speech Dereverberation
    Wang, Meihuang
    Yuan, Minmin
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 324 - 329
  • [9] Blind dereverberation of monaural speech signals based on harmonic structure
    Nakatani, Tomohiro
    Miyoshi, Masato
    Kinoshita, Keisuke
    Systems and Computers in Japan, 2006, 37 (06): : 1 - 12
  • [10] On the importance of power compression and phase estimation in monaural speech dereverberation
    Li, Andong
    Zheng, Chengshi
    Peng, Renhua
    Li, Xiaodong
    JASA EXPRESS LETTERS, 2021, 1 (01):