Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

被引:7
|
作者
Li, Ruwei [1 ]
Sun, Xiaoyue [1 ]
Liu, Yanan [1 ]
Yang, Dengcai [1 ]
Dong, Liang [2 ]
机构
[1] Beijing Univ Technol, Sch Informat & Commun Engn, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
基金
中国国家自然科学基金;
关键词
Speech enhancement; Deep neural network; Multi-resolution auditory cepstral coefficient; Adaptive mask; NOISE;
D O I
10.1186/s13634-019-0618-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] CellSegNet: An Adaptive Multi-resolution Hybrid Network for Cell Segmentation
    Deng, Junwei
    Shen, Yiqing
    Guo, Yi
    Ke, Jing
    MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
  • [32] Binaural Deep Neural Network for Robust Speech Enhancement
    Jiang, Yi
    Liu, Runsheng
    2014 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2014, : 692 - 695
  • [33] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [34] Supervised speech enhancement based on deep neural network
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Qazi, Abdul Baser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5187 - 5201
  • [35] The Application of Deep Neural Network in Speech Enhancement Processing
    Chen Jian-ming
    Liang Zhi-cheng
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 1263 - 1266
  • [36] Partial Discharge Recognition with a Multi-Resolution Convolutional Neural Network
    Li, Gaoyang
    Wang, Xiaohua
    Li, Xi
    Yang, Aijun
    Rong, Mingzhe
    SENSORS, 2018, 18 (10)
  • [37] Medical ultrasound image speckle reduction and resolution enhancement using texture compensated multi-resolution convolution neural network
    Moinuddin, Muhammad
    Khan, Shujaat
    Alsaggaf, Abdulrahman U.
    Abdulaal, Mohammed Jamal
    Al-Saggaf, Ubaid M.
    Ye, Jong Chul
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [38] Multi-resolution convolutional neural network for specific emitter identification
    Cui, Tianshu
    Li, Ruike
    Li, Zhihao
    Shi, Liang
    Zhang, Hongjiang
    PROCEEDINGS OF THE 2024 6TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING SYSTEMS, SSPS 2024, 2024, : 56 - 62
  • [39] Multi-resolution attention convolutional neural network for crowd counting
    Zhang, Youmei
    Zhou, Chunluan
    Chang, Faliang
    Kot, Alex C.
    NEUROCOMPUTING, 2019, 329 : 144 - 152
  • [40] Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction
    Dash, Tusar Kanti
    Solanki, Sandeep Singh
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 111 (02) : 1073 - 1087