Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement

被引:62
|
作者
Sun, Meng [1 ]
Zhang, Xiongwei [1 ]
Van hamme, Hugo [2 ]
Zheng, Thomas Fang [3 ]
机构
[1] PLA Univ Sci & Technol, Lab Intelligent Informat Proc, Nanjing 210007, Jiangsu, Peoples R China
[2] Katholieke Univ Leuven, Elect Engn Dept ESAT, Speech Proc Res Grp, B-3000 Louvain, Belgium
[3] Tsinghua Univ, Res Inst Informat Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep auto encoder; source separation; speech enhancement; unseen noise compensation; HMM;
D O I
10.1109/TASLP.2015.2498101
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Unseen noise estimation is a key yet challenging step to make a speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about the encountered noise is that it is different from the involved speech. Therefore, by subtracting the components which cannot be adequately represented by a well defined speech model, the noises can be estimated and removed. Given the good performance of deep learning in signal representation, a deep auto encoder (DAE) is employed in this work for accurately modeling the clean speech spectrum. In the subsequent stage of speech enhancement, an extra DAE is introduced to represent the residual part obtained by subtracting the estimated clean speech spectrum (by using the pre-trained DAE) from the noisy speech spectrum. By adjusting the estimated clean speech spectrum and the unknown parameters of the noise DAE, one can reach a stationary point to minimize the total reconstruction error of the noisy speech spectrum. The enhanced speech signal is thus obtained by transforming the estimated clean speech spectrum back into time domain. The above proposed technique is called separable deep auto encoder (SDAE). Given the under-determined nature of the above optimization problem, the clean speech reconstruction is confined in the convex hull spanned by a pre-trained speech dictionary. New learning algorithms are investigated to respect the non-negativity of the parameters in the SDAE. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the conventional baselines.
引用
收藏
页码:93 / 104
页数:12
相关论文
共 50 条
  • [41] Estimation ofa priorisignal-to-noise ratio using neurograms for speech enhancement
    Jassim, Wissam A.
    Harte, Naomi
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (06): : 3830 - 3848
  • [42] Speech Enhancement in Presence of Colored Noise Using An Improved Least Square Estimation
    Xia, Youshen
    Wang, Pengyu
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 779 - 786
  • [43] Improving FDI Detection for PMU State Estimation Using Adversarial Interventions and Deep Auto-Encoder
    Almasabi, Saleh
    Mushtaq, Zohaib
    Khan, Nabeel Ahmed
    Irfan, Muhammad
    IEEE ACCESS, 2024, 12 : 116398 - 116414
  • [44] Unsupervised speech enhancement with deep dynamical generative speech and noise models
    Lin, Xiaoyu
    Leglaive, Simon
    Girin, Laurent
    Alameda-Pineda, Xavier
    INTERSPEECH 2023, 2023, : 5102 - 5106
  • [45] BINAURAL NOISE PSD ESTIMATION FOR BINAURAL SPEECH ENHANCEMENT
    Azarpour, Masoumeh
    Enzner, Gerald
    Martin, Rainer
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] Efficient recursive estimation for speech enhancement in colored noise
    Changwon Natl Univ, Kyungnam, Korea, Republic of
    IEEE Signal Process Lett, 7 (196-199):
  • [47] A computationally efficient noise estimation algorithm for speech enhancement
    Reju, VG
    Chow, TY
    PROCEEDINGS OF THE 2004 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1 AND 2: SOC DESIGN FOR UBIQUITOUS INFORMATION TECHNOLOGY, 2004, : 193 - 196
  • [48] Speech enhancement based on a priori signal to noise estimation
    Scalart, P
    Vieira, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 629 - 632
  • [49] Efficient recursive estimation for speech enhancement in colored noise
    Lee, KY
    Shirai, K
    IEEE SIGNAL PROCESSING LETTERS, 1996, 3 (07) : 196 - 199
  • [50] MIXTURE FACTORIZED AUTO-ENCODER FOR UNSUPERVISED HIERARCHICAL DEEP FACTORIZATION OF SPEECH SIGNAL
    Peng, Zhiyuan
    Feng, Siyuan
    Lee, Tan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6774 - 6778