Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

被引:7
|
作者
Li, Ruwei [1 ]
Sun, Xiaoyue [1 ]
Liu, Yanan [1 ]
Yang, Dengcai [1 ]
Dong, Liang [2 ]
机构
[1] Beijing Univ Technol, Sch Informat & Commun Engn, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
基金
中国国家自然科学基金;
关键词
Speech enhancement; Deep neural network; Multi-resolution auditory cepstral coefficient; Adaptive mask; NOISE;
D O I
10.1186/s13634-019-0618-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Multi-objective Learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement
    Xu, Yong
    Du, Jun
    Huang, Zhen
    Dai, Li-Rong
    Lee, Chin-Hui
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1508 - 1512
  • [22] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
    Shiva Gholami-Boroujeny
    Anwar Fallatah
    Brian P. Heffernan
    Hilmi R. Dajani
    Signal, Image and Video Processing, 2016, 10 : 389 - 395
  • [23] An auditory-based adaptive speech enhancement system by neural network according to noise intensity
    Choi, J
    Okamoto, J
    Nakajima, S
    Suzuki, Y
    Hosokawa, S
    42ND MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1999, : 993 - 996
  • [24] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
    Gholami-Boroujeny, Shiva
    Fallatah, Anwar
    Heffernan, Brian P.
    Dajani, Hilmi R.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 389 - 395
  • [25] NOISE-ADAPTIVE DEEP NEURAL NETWORK FOR SINGLE-CHANNEL SPEECH ENHANCEMENT
    Chung, Hanwook
    Kim, Taesup
    Plourde, Eric
    Champagne, Benoit
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [26] Multi-Resolution Edge-aware Lighting Enhancement Network
    Gong, Wenyong
    Chen, Wenzhu
    Yu, Zhongwei
    Xie, Xiaohua
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 55 - 63
  • [27] Auditory filterbank denoising neural network for speech enhancement in wearable auditory device
    Kim, Seon Man
    ELECTRONICS LETTERS, 2024, 60 (10)
  • [28] Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
    Mehta, Nancy
    Dudhane, Akshay
    Murala, Subrahmanyam
    Zamir, Syed Waqas
    Khan, Salman
    Khan, Fahad Shahbaz
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22201 - 22210
  • [29] Ideal neighbourhood mask for speech enhancement using deep neural networks
    Arcos, Christian
    Vellasco, Marley
    Alcaim, Abraham
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [30] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351