Monaural Speech Dereverberation Using Deformable Convolutional Networks

被引:2
|
作者
Kothapally, Vinay [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Speech enhancement; monaural dereverberation; deformable convolutional networks; minimum variance distortionless response; deep filtering; TIME-FREQUENCY MASKING; NEURAL-NETWORK; SELF-ATTENTION; ENHANCEMENT; NOISE; OPTIMIZATION; FRAMEWORK; DOMAIN; CNN;
D O I
10.1109/TASLP.2024.3358720
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reverberation and background noise can degrade speech quality and intelligibility when captured by a distant microphone. In recent years, researchers have developed several deep learning (DL)-based single-channel speech dereverberation systems that aim to minimize distortions introduced into speech captured in naturalistic environments. A majority of these DL-based systems enhance an unseen distorted speech signal by applying a predetermined set of weights to regions of the speech spectrogram, regardless of the degree of distortion within the respective regions. Such a system might not be an ideal solution for dereverberation task. To address this, we present a DL-based end-to-end single-channel speech dereverberation system that uses deformable convolution networks (DCN) that dynamically adjusts its receptive field based on the degree of distortions within an unseen speech signal. The proposed system includes the following components to simultaneously enhance the magnitude and phase responses of speech, which leads to improved perceptual quality: (i) a complex spectrum enhancement module that uses multi-frame filtering technique to implicitly correct the phase response, (ii) a magnitude enhancement module that suppresses dominant reflections and recovers the formant structure using deep filtering (DF) technique, and (iii) a speech activity detection (SAD) estimation module that predicts frame-wise speech activity to suppress residuals in non-speech regions. We assess the performance of the proposed system by employing objective speech quality metrics on both simulated and real speech recordings from the REVERB challenge corpus. The experimental results demonstrate the benefits of using DCNs and multi-frame filtering for speech dereverberation task. We compare the performance of our proposed system against other signal processing (SP) and DL-based systems and observe that it consistently outperforms other approaches across all speech quality metrics.
引用
收藏
页码:1712 / 1723
页数:12
相关论文
共 50 条
  • [21] Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation
    Xiong, Feifei
    Chen, Weiguang
    Wang, Pengyu
    Li, Xiaofei
    Feng, Jinwei
    INTERSPEECH 2022, 2022, : 931 - 935
  • [22] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Sunny Dayal Vanambathina
    Vaishnavi Anumola
    Ponnapalli Tejasree
    R. Divya
    B. Manaswini
    Multimedia Tools and Applications, 2023, 82 : 45717 - 45732
  • [23] SPEECH DEREVERBERATION USING A LEARNED SPEECH MODEL
    Liang, Dawen
    Hoffman, Matthew D.
    Mysore, Gautham J.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1871 - 1875
  • [24] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Vanambathina, Sunny Dayal
    Anumola, Vaishnavi
    Tejasree, Ponnapalli
    Divya, R.
    Manaswini, B.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (29) : 45717 - 45732
  • [25] Deformable image registration using convolutional neural networks
    Eppenhof, Koen A. J.
    Lafarge, Maxime W.
    Moeskops, Pim
    Veta, Mitko
    Pluim, Josien P. W.
    MEDICAL IMAGING 2018: IMAGE PROCESSING, 2018, 10574
  • [26] Linear Prediction-based Dereverberation with Very Deep Convolutional Neural Networks for Reverberant Speech Recognition
    Park, Sunchan
    Jeong, Yongwon
    Kim, Min Sik
    Kim, Hyung Soon
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 310 - 311
  • [27] Deformable Graph Convolutional Networks
    Park, Jinyoung
    Yoo, Sungdong
    Park, Jihwan
    Kim, Hyunwoo J.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7949 - 7956
  • [28] Bifurcation and Reunion: A Loss-Guided Two-Stage Approach for Monaural Speech Dereverberation
    Luo, Xiaoxue
    Zheng, Chengshi
    Li, Andong
    Ke, Yuxuan
    Li, Xiaodong
    INTERSPEECH 2022, 2022, : 2503 - 2507
  • [29] SPEECH DEREVERBERATION USING VARIATIONAL AUTOENCODERS
    Baby, Deepak
    Bourlard, Herve
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5784 - 5788
  • [30] Dilated convolutional recurrent neural network for monaural speech enhancement
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162