A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments

被引：0

作者：

Wang, Heming ^{[1
]}

Pandey, Ashutosh ^{[1
]}

Wang, Deliang ^{[2
]}

机构：

[1] Ohio State Univ, 281 Lane Ave, Columbus, OH 43210 USA

[2] Ctr Cognit & Brain Sci, 1835 Neil Ave, Columbus, OH 43210 USA

来源：

COMPUTER SPEECH AND LANGUAGE | 2025年 / 89卷

关键词：

Speech enhancement; Speech dereverberation; Self-attention; ARN; DC-CRN; NEURAL-NETWORK; DEREVERBERATION; IDENTIFICATION; RECOGNITION;

D O I：

10.1016/j.csl.2024.101677

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time -domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely -connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant -only and reverberant -noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time -domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.

引用

页数：12

共 50 条

[41] Intelligibility Enhancement of Casual Speech for Reverberant Environments inspired by Clear Speech Properties
Koutsogiannaki, Maria
Petkov, Petko N.
Stylianou, Yannis
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 65 - 69
[42] Speech recognition based on HMM decomposition and composition method with a microphone array in noisy reverberant environments
Miki, K
Nishiura, T
Nakamura, S
Shikano, K
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (09): : 13 - 22
[43] Design of the Wiener gain in noisy and reverberant environments
Xiang, Qian
Chen, Jingdong
Benesty, Jacob
Lei, Tao
Pan, Chao
APPLIED ACOUSTICS, 2025, 231
[44] HMM-Based Multipitch Tracking for Noisy and Reverberant Speech
Jin, Zhaozhang
Wang, DeLiang
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1091 - 1102
[45] A MULTIPITCH TRACKING ALGORITHM FOR NOISY AND REVERBERANT SPEECH
Jin, Zhaozhang
Wang, DeLiang
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4218 - 4221
[46] A comparative study on time delay estimation in reverberant and noisy environments
Chen, JD
Huang, YT
Benesty, J
2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 21 - 24
[47] A DNN Parameter Mask for the Binaural Reverberant Speech Segregation
Jiang, Yi
Li, Wei
Zu, Yuanyuan
Liu, Runsheng
Ma, Chao
2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 959 - 963
[48] Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement
Zhao, Yan
Wang, Zhong-Qiu
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 53 - 62
[49] ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS
Xiong, Feifei
Goetze, Stefan
Meyer, Bernd T.
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5250 - 5254
[50] Performance Evaluation of Nonlinear Speech Enhancement Based on Virtual Increase of Channels in Reverberant Environments
Yamaoka, Kouei
Makino, Shoji
Ono, Nobutaka
Yamada, Takeshi
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2324 - 2328

← 1 2 3 4 5 →