Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引:2
|
作者
Kothapally, Vinay [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;
D O I
10.1109/TASLP.2024.3358720
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [41] Monaural speech segregation using synthetic speech signals
    Brungart, DS
    Iyer, N
    Simpson, BD
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (04): : 2327 - 2333
  • [42] A Fast Convolutional Self-attention Based Speech Dereverberation Method for Robust Speech Recognition
    Li, Nan
    Ge, Meng
    Wang, Longbiao
    Dang, Jianwu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 295 - 305
  • [43] COMPLEX SPECTRAL MAPPING WITH A CONVOLUTIONAL RECURRENT NETWORK FOR MONAURAL SPEECH ENHANCEMENT
    Tan, Ke
    Wang, DeLiang
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6865 - 6869
  • [44] Speech denoising and dereverberation using probabilistic models
    Attias, H
    Platt, JC
    Acero, A
    Deng, L
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 758 - 764
  • [45] Light Field View Synthesis using Deformable Convolutional Neural Networks
    Zubair, Muhammad
    Nunes, Paulo
    Conti, Caroline
    Soares, Luis Ducla
    2024 PICTURE CODING SYMPOSIUM, PCS 2024, 2024,
  • [46] Fault Detection in Railway Switches using Deformable Convolutional Neural Networks
    Maack, Robert F.
    Tercan, Hasan
    Solvay, Alexia F.
    Mieth, Maximilian
    Meisen, Tobias
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [47] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585
  • [48] Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Han, Jiqing
    Shi, Anyan
    INTERSPEECH 2019, 2019, : 3183 - 3187
  • [49] Speech Dereverberation With Context-Aware Recurrent Neural Networks
    Santos, Joao Felipe
    Falk, Tiago H.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1232 - 1242
  • [50] An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
    Xu, Zezheng
    Jiang, Ting
    Li, Chao
    Yu, Jiacheng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,