Monaural Speech Separation by Means of Convolutive Nonnegative Matrix Partial Co-factorization in Low SNR Condition

被引:0
|
作者
Dong X.-L. [1 ]
Hu Y. [1 ]
Huang H. [1 ,2 ]
Wushour S. [1 ,2 ]
机构
[1] Department of Information Science and Engineering, Xinjiang University, Urumqi
[2] Laboratory of Multi-lingual Information Technology, Xinjiang University, Urumqi
来源
基金
中国国家自然科学基金;
关键词
Convolutive nonnegative matrix factorization (CNMF); Monaural speech; Nonnegative matrix partial co-factorization (NMPCF); Speech separation; Strong noise;
D O I
10.16383/j.aas.c180065
中图分类号
学科分类号
摘要
Nonnegative matrix partial co-factorization (NMPCF) is a joint matrix decomposition algorithm integrating prior knowledge of specific source to help separate specific source signal from monaural mixtures. Convolutive nonnegative matrix factorization (CNMF), which introduces the concept of a convolutive non-negative basis set during NMF process, opens up an interesting avenue of research in the field of monaural sound separation. On the basis of the above two algorithms, we propose a speech separation algorithm named as convolutive nonnegative matrix partial co-factorization (CNMPCF) for low signal noise ratio (SNR) monaural speech. Firstly, through a voice detection process exploring fundamental frequency estimation algorithm, we divide a mixture signal into vocal and nonvocal parts, thus those vocal parts are used as test mixture signal while the nonvocal parts (pure noise) participat in the partial joint decomposition. After CNMPCF, we can obtain the separated speech spectrogram. Then, the separated speech signal can reconstructed through Inverse short time fourier transformation. In the experiments, we select 5 SNRs from 0 dB to -12 dB at -3 dB intervals to obtain low SNR mixture speeches. The results demonstrate that the proposed CNMPCF approach has superiority over sparse convolutive nonnegative matrix factorization (SCNMF) and NMPCF under different noise types and noise intensities. Copyright © 2020 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:1200 / 1209
页数:9
相关论文
共 22 条
  • [1] Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P., Deep learning for monaural speech separation, Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1562-1566, (2014)
  • [2] Huang P S, Kim M, Hasegawa-Johnson M, Smaragdis P., Joint optimization of masks and deep recurrent neural networks for monaural source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 12, pp. 2136-2147, (2015)
  • [3] Liu Wen-Ju, Nie Shuai, Liang Shan, Zhang Xue-Liang, Deep learning based speech separation technology and its developments, Acta Automatica Sinica, 42, 6, pp. 819-833, (2016)
  • [4] Lee D D, Seung H S., Learning the parts of objects by non-negative matrix factorization, Nature, 401, 6755, pp. 788-791, (1999)
  • [5] Wang D L, Brown G J., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, (2006)
  • [6] Han Wei, Zhang Xiong-Wei, Min Gang, Zhang Qi-Ye, A single-channel speech enhancement approach based on perceptual masking deep neural network, Acta Automatica Sinica, 43, 2, pp. 248-258, (2017)
  • [7] Yuan Wen-Hao, Sun Wen-Zhu, Xia Bin, Ou Shi-Feng, Improving speech enhancement in unseen noise using deep convolutional neural network, Acta Automatica Sinica, 44, 4, pp. 751-759, (2018)
  • [8] Smaragdis P., Convolutive speech bases and their application to supervised speech separation, IEEE Transactions on Audio, Speech, and Language Processing, 15, 1, pp. 1-12, (2007)
  • [9] O'Grady P D, Pearlmutter B A., Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint, Neurocomputing, 72, 1-3, pp. 88-101, (2008)
  • [10] Sun M, Li Y N, Gemmeke J F, Zhang X W., Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback--Leibler divergence, IEEE Transactions on Audio, Speech, and Language Processing, 23, 7, pp. 1233-1242, (2015)