A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引:0
|
作者
Wang, Heming [1 ]
Pandey, Ashutosh [1 ]
Wang, Deliang [2 ]
机构
[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA
[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA
来源
关键词
Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;
D O I
10.1016/j.csl.2024.101677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Enhancement methods for reverberant speech
    Cole, D
    Moody, M
    Sridharan, S
    ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 383 - 386
  • [32] A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions
    Zhao, Yan
    Wang, DeLiang
    Johnson, Eric M.
    Healy, Eric W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 144 (03): : 1627 - 1637
  • [33] Robust direction of arrival estimation for speech enhancement in noisy reverberant rooms
    Lotter, T
    Loellmann, HW
    Vary, P
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4186 - 4186
  • [34] MAXIMUM LIKELIHOOD PSD ESTIMATION FOR SPEECH ENHANCEMENT IN REVERBERANT AND NOISY CONDITIONS
    Kuklasinski, Adam
    Doclo, Simon
    Jensen, Jesper
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 599 - 603
  • [35] Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept
    Van Ngo, Thuan
    Ho, Tuan Vu
    Unoki, Masashi
    Kubo, Rieko
    Akagi, Masato
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 753 - 758
  • [36] Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition
    Neo, Vincent W.
    Evers, Christine
    Naylor, Patrick A.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3255 - 3266
  • [37] Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement
    Li, Lu
    Jia, Maoshen
    Liu, Jinxiang
    Pai, Tun-Wen
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (10) : 6001 - 6028
  • [38] A STUDY ON JOINT BEAMFORMING AND SPECTRAL ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN REVERBERANT ENVIRONMENTS
    Xiong, Feifei
    Meyer, Bernd T.
    Goetze, Stefan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5043 - 5047
  • [39] Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement
    Lu Li
    Maoshen Jia
    Jinxiang Liu
    Tun-Wen Pai
    Circuits, Systems, and Signal Processing, 2023, 42 : 6001 - 6028
  • [40] Speech Privacy Protection based on Optimal Controlling Estimated Speech Transmission Index in Noisy Reverberant Environments
    Duangpummet, Suradej
    Kraikhun, Phrimphissa
    Phunruangsakao, Chatrin
    Karnjana, Jessada
    Unoki, Masashi
    Kongprawechnon, Waree
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 76 - 80