Speech enhancement method based on the perceptual joint optimization deep neural network

被引:0
|
作者
Yuan W. [1 ]
Lou Y. [1 ]
Liang C. [1 ]
Wang Z. [1 ]
机构
[1] College of Computer Science and Technology, Shandong University of Technology, Zibo
关键词
Correlation; Cost function; Deep neural network; Speech enhancement;
D O I
10.19665/j.issn1001-2400.2019.02.015
中图分类号
学科分类号
摘要
In the training of speech enhancement models based on the deep neural network (DNN), the mean square error is generally adopted as the cost function, which is not optimized for the speech enhancement problem. In view of this problem, to consider the correlation between the adjacent frames of the network's output and the presence of the speech component in each time-frequency unit, by correlating the adjacent frames of the network's output and designing a perceptual coefficient related to the presence of the speech component in time-frequency units in the cost function, a speech enhancement method based on the joint optimization DNN is proposed. Experimental results show that compared with the speech enhancement method based on the mean square error, the proposed method significantly improves the quality and intelligibility of the enhanced speech and has a better speech enhancement performance. © 2019, The Editorial Board of Journal of Xidian University. All right reserved.
引用
收藏
页码:90 / 94
页数:4
相关论文
共 15 条
  • [1] Wang D.M., Hansen J.H.L., Single Channel Speech Enhancement Based on Harmonic Estimation Combined with Statistical Based Method to Improve Speech Intelligibility for Cochlear Implant Recipients, Acoustical Society of America Journal, 141, 5, pp. 3985-3986, (2017)
  • [2] Liu W., Nie S., Liang S., Et al., Deep Learning Based Speech Separation Technology and Its Developments, Acta Automatica Sinica, 42, 6, pp. 819-833, (2016)
  • [3] Xu Y., Du J., Dai L.R., Et al., An Experimental Study on Speech Enhancement Based on Deep Neural Networks, IEEE Signal Processing Letters, 21, 1, pp. 65-68, (2014)
  • [4] Xu Y., Du J., Dai L.R., Et al., A Regression Approach to Speech Enhancement Based on Deep Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 1, pp. 7-19, (2015)
  • [5] Chen J., Wang Y., Yoho S.E., Et al., Large-scale Training to Increase Speech Intelligibility for Hearing-impaired Listeners in Novel Noises, Journal of the Acoustical Society of America, 139, 5, pp. 2604-2612, (2016)
  • [6] Chen J., Wang D., Long Short-term Memory for Speaker Generalization in Supervised Speech Separation, Journal of the Acoustical Society of America, 141, 6, pp. 4705-4714, (2017)
  • [7] Williamson D.S., Wang D.L., Time-frequency Masking in the Complex Domain for Speech Dereverberation and Denoising, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25, 7, pp. 1492-1501, (2017)
  • [8] Yuan W., Sun W., Xia B., Et al., Improving Speech Enhancement in Unseen Noise Using Deep Convolutional Neural Network, Acta Automatica Sinica, 44, 4, pp. 751-759, (2018)
  • [9] Loizou P.C., Speech Enhancement Based on Perceptually Motivated Bayesian Estimators of the Magnitude Spectrum, IEEE Transactions on Speech and Audio Processing, 13, 5, pp. 857-869, (2005)
  • [10] Garofolo J.S., Lamel L.F., Fisher W.M., Et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus: LDC93S1