An Overview of Monaural Speech Denoising and Dereverberation Research

被引:0
|
作者
Lan T. [1 ]
Peng C. [1 ]
Li S. [1 ]
Ye W. [1 ]
Li M. [1 ]
Hui G. [1 ]
Lü Y. [1 ]
Qian Y. [1 ]
Liu Q. [1 ]
机构
[1] School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu
来源
Liu, Qiao (qliu@uestc.edu.cn) | 1600年 / Science Press卷 / 57期
基金
中国国家自然科学基金;
关键词
Deep neural network; Machine learning; Speech denoising; Speech dereverberation; Speech enhancement;
D O I
10.7544/issn1000-1239.2020.20190306
中图分类号
学科分类号
摘要
Speech enhancement refers to the use of audio signal processing techniques and various algorithms to improve the intelligibility and quality of the distorted speech signals. It has great research value and a wide range of applications including speech recognition, VoIP, tele-conference and hearing aids. Most early work utilized unsupervised digital signal analysis methods to decompose the speech signal to obtain the characteristics of the clean speech and the noise. With the development of machine learning, some supervised methods which aim to learn the relationship between noisy and clean speech signals were proposed. In particular, the introduction of deep learning has greatly improved the performance. In order to help beginners and related researchers to understand the current research status of this topic, this paper conducts a comprehensive survey of the development process of the monaural speech enhancement, and systematically summarizes from the aspect of model methods, datasets, features, evaluation metrics, etc. First, we divide speech enhancement into noise reduction and de-reverberation, then respectively sort out the existing work of traditional and machine-learning-based methods in these two directions. Moreover, we briefly introduce the main ideas of typical solutions, and compare the performance of different methods. Then, commonly used datasets, features, learning objectives and evaluation metrics in experiments are enumerated and illustrated. Finally, four major challenges and corresponding issues in this area are summarized. © 2020, Science Press. All right reserved.
引用
收藏
页码:928 / 953
页数:25
相关论文
共 173 条
  • [1] Benesty J., Makino S., Chen J., Speech Enhancement, (2005)
  • [2] Tu J., Xia Y., Zhang S., A complex-valued multichannel speech enhancement learning algorithm for optimal tradeoff between noise reduction and speech distortion, Neurocomputing, 267, pp. 333-343, (2017)
  • [3] Araki S., Hayashi T., Delcroix M., Et al., Exploring multi-channel features for denoising-autoencoder-based speech enhancement, Proc of the 40th IEEE Int Conf on Acoustics, Speech and Signal Processing, pp. 116-120, (2015)
  • [4] Wang D., Chen J., Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio Speech and Language Processing, 26, 10, pp. 1702-1726, (2018)
  • [5] Boll S., Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 2, pp. 113-120, (1979)
  • [6] McAulay R., Malpass M., Speech enhancement using a soft-decision noise suppression filter, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 2, pp. 137-145, (1980)
  • [7] Gustafsson H., Nordholm S.E., Claesson I., Spectral subtraction using reduced delay convolution and adaptive averaging, IEEE Transactions on Speech and Audio Processing, 9, 8, pp. 799-807, (2001)
  • [8] Hu Y., Bhatnagar M., Loizou P.C., A cross-correlation technique for enhancing speech corrupted with correlated noise, Proc of the 26th IEEE Int Conf on Acoustics, Speech and Signal Processing, pp. 673-676, (2001)
  • [9] Zhong X., Dai Y., Dai Y., Et al., Study on processing of wavelet speech denoising in speech recognition system, International Journal of Speech Technology, 21, 3, pp. 563-569, (2018)
  • [10] Chen J., Benesty J., Huang Y., Et al., New insights into the noise reduction Wiener filter, IEEE Transactions on Audio, Speech, and Language Processing, 14, 4, pp. 1218-1234, (2006)