DNN TRAINING BASED ON CLASSIC GAIN FUNCTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

被引:0
|
作者
Tu, Yan-Hui [1 ]
Du, Jun [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
statistical speech enhancement; ideal ratio mask; deep learning; gain function; speech recognition; NOISE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For conventional single-channel speech enhancement based on noise power spectrum, the speech gain function, which suppresses background noise at each time-frequency bin, is calculated by prior signal-to-noise-ratio (SNR). Hence, accurate prior SNR estimation is paramount for successful noise suppression. Accordingly, we have proposed a single-channel approach to combine conventional and deep learning techniques for speech enhancement and automatic speech recognition (ASR) recently. However, the combination process is at the testing stage, which is time-consuming with a complicated procedure. In this study, the gain function of classic speech enhancement will be utilized to optimize the ideal ratio mask based deep neural network (DNN-IRM) at the training stage, denoted as GF-DNN-IRM. And at the testing stage, the estimated IRM by GF-DNN-IRM model is directly used to generate enhanced speech without involving the conventional speech enhancement process. In addition, DNNs with less parameters in the causal processing mode are also discussed. Experiments of the CHiME-4 challenge task show that our proposed algorithm can achieve a relative word error rate reduction of 6.57% on RealData test set comparing to unprocessed speech without acoustic model retraining in causal mode, while the traditional DNN-IRM method fails to improve ASR performance in this case.
引用
收藏
页码:910 / 914
页数:5
相关论文
共 50 条
  • [21] CompNet: Complementary network for single-channel speech enhancement
    Fan, Cunhang
    Zhang, Hongmei
    Li, Andong
    Xiang, Wang
    Zheng, Chengshi
    Lv, Zhao
    Wu, Xiaopei
    NEURAL NETWORKS, 2023, 168 : 508 - 517
  • [22] Single-channel speech enhancement using colored spectrograms
    Gul, Sania
    Khan, Muhammad Salman
    Fazeel, Muhammad
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [23] Comparative Studies of Single-Channel Speech Enhancement Techniques
    Kumar, Bittu
    Kumar, Neeraj
    Kumar, Manoj
    Prasad, S. V. S.
    Varma, Ashwini Kumar
    Ravi, Banoth
    IETE JOURNAL OF RESEARCH, 2024, 70 (06) : 5704 - 5720
  • [24] Single-Channel Speech Enhancement Using Double Spectrum
    Blass, Martin
    Mowlaee, Pejman
    Kleijn, W. Bastiaan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1740 - 1744
  • [25] A spectral conversion approach to single-channel speech enhancement
    Mouchtaris, Athanasios
    Van der Spiegel, Jan
    Mueller, Paul
    Tsakalides, Panagiotis
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1180 - 1193
  • [26] UltraSE: Single-Channel Speech Enhancement Using Ultrasound
    Sun, Ke
    Zhang, Xinyu
    PROCEEDINGS OF THE 27TH ACM ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING (ACM MOBICOM '21), 2021, : 160 - 173
  • [27] Phase-Aware Single-channel Speech Enhancement
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    Saeidi, Rahim
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
  • [28] Smartphone-based single-channel speech enhancement application for hearing aids
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    Tittle, Stephanie
    Thibodeau, Linda M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (03): : 1663 - 1673
  • [29] Single-Channel Speech Enhancement With Phase Reconstruction Based on Phase Distortion Averaging
    Wakabayashi, Yukoh
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1559 - 1569
  • [30] SINGLE-CHANNEL ENHANCEMENT OF CONVOLUTIVE NOISY SPEECH BASED ON A DISCRIMINATIVE NMF ALGORITHM
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2302 - 2306