Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement

被引:62
|
作者
Sun, Meng [1 ]
Zhang, Xiongwei [1 ]
Van hamme, Hugo [2 ]
Zheng, Thomas Fang [3 ]
机构
[1] PLA Univ Sci & Technol, Lab Intelligent Informat Proc, Nanjing 210007, Jiangsu, Peoples R China
[2] Katholieke Univ Leuven, Elect Engn Dept ESAT, Speech Proc Res Grp, B-3000 Louvain, Belgium
[3] Tsinghua Univ, Res Inst Informat Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep auto encoder; source separation; speech enhancement; unseen noise compensation; HMM;
D O I
10.1109/TASLP.2015.2498101
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Unseen noise estimation is a key yet challenging step to make a speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about the encountered noise is that it is different from the involved speech. Therefore, by subtracting the components which cannot be adequately represented by a well defined speech model, the noises can be estimated and removed. Given the good performance of deep learning in signal representation, a deep auto encoder (DAE) is employed in this work for accurately modeling the clean speech spectrum. In the subsequent stage of speech enhancement, an extra DAE is introduced to represent the residual part obtained by subtracting the estimated clean speech spectrum (by using the pre-trained DAE) from the noisy speech spectrum. By adjusting the estimated clean speech spectrum and the unknown parameters of the noise DAE, one can reach a stationary point to minimize the total reconstruction error of the noisy speech spectrum. The enhanced speech signal is thus obtained by transforming the estimated clean speech spectrum back into time domain. The above proposed technique is called separable deep auto encoder (SDAE). Given the under-determined nature of the above optimization problem, the clean speech reconstruction is confined in the convex hull spanned by a pre-trained speech dictionary. New learning algorithms are investigated to respect the non-negativity of the parameters in the SDAE. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the conventional baselines.
引用
收藏
页码:93 / 104
页数:12
相关论文
共 50 条
  • [1] Improving Speech Enhancement in Unseen Noise Using Deep Convolutional Neural Network
    Yuan W.-H.
    Sun W.-Z.
    Xia B.
    Ou S.-F.
    Zidonghua Xuebao/Acta Automatica Sinica, 2018, 44 (04): : 751 - 759
  • [2] Enhanced Denoising Auto-Encoder for Robust Speech Recognition in Unseen Noise Conditions
    Joshi, Sonal
    Panda, Ashish
    Das, Biswajit
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 359 - 363
  • [3] Speech enhancement with noise estimation and filtration using deep learning models
    Kantamaneni, Sravanthi
    Charles, A.
    Babu, T. Ranga
    THEORETICAL COMPUTER SCIENCE, 2023, 941 : 14 - 28
  • [4] Speech enhancement with noise estimation and filtration using deep learning models
    Kantamaneni, Sravanthi
    Charles, A.
    Babu, T. Ranga
    THEORETICAL COMPUTER SCIENCE, 2023, 941 : 14 - 28
  • [5] Binary Coding of Speech Spectrograms Using a Deep Auto-encoder
    Deng, L.
    Seltzer, M.
    Yu, D.
    Acero, A.
    Mohamed, A.
    Hinton, G.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1692 - +
  • [6] Speech Enhancement with Weighted Denoising Auto-Encoder
    Xia, Bing-yin
    Bao, Chang-chun
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3411 - 3415
  • [7] Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification
    Xia, Bingyin
    Bao, Changchun
    SPEECH COMMUNICATION, 2014, 60 : 13 - 29
  • [8] MONAURAL SPEECH SEPARATION USING A PHASE-AWARE DEEP DENOISING AUTO ENCODER
    Williamson, Donald S.
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [9] Deep Feature Learning for Tibetan Speech Recognition using Sparse Auto-encoder
    Wang, H.
    Zhao, Y.
    Liu, X. F.
    Xu, X. N.
    Wang, L.
    Zhou, N.
    Xu, Y. M.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, AUTOMATION AND MECHANICAL ENGINEERING (EAME 2015), 2015, 13 : 342 - 345
  • [10] Speech enhancement based on dynamic noise estimation within auto-correlation domain
    Wu, YD
    Wu, XH
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 281 - 284