DNN TRAINING BASED ON CLASSIC GAIN FUNCTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

被引：0

作者：

Tu, Yan-Hui ^{[1
]}

Du, Jun ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

statistical speech enhancement; ideal ratio mask; deep learning; gain function; speech recognition; NOISE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For conventional single-channel speech enhancement based on noise power spectrum, the speech gain function, which suppresses background noise at each time-frequency bin, is calculated by prior signal-to-noise-ratio (SNR). Hence, accurate prior SNR estimation is paramount for successful noise suppression. Accordingly, we have proposed a single-channel approach to combine conventional and deep learning techniques for speech enhancement and automatic speech recognition (ASR) recently. However, the combination process is at the testing stage, which is time-consuming with a complicated procedure. In this study, the gain function of classic speech enhancement will be utilized to optimize the ideal ratio mask based deep neural network (DNN-IRM) at the training stage, denoted as GF-DNN-IRM. And at the testing stage, the estimated IRM by GF-DNN-IRM model is directly used to generate enhanced speech without involving the conventional speech enhancement process. In addition, DNNs with less parameters in the causal processing mode are also discussed. Experiments of the CHiME-4 challenge task show that our proposed algorithm can achieve a relative word error rate reduction of 6.57% on RealData test set comparing to unprocessed speech without acoustic model retraining in causal mode, while the traditional DNN-IRM method fails to improve ASR performance in this case.

引用

页码：910 / 914

页数：5

共 50 条

[21] CompNet: Complementary network for single-channel speech enhancement
Fan, Cunhang
Zhang, Hongmei
Li, Andong
Xiang, Wang
Zheng, Chengshi
Lv, Zhao
Wu, Xiaopei
NEURAL NETWORKS, 2023, 168 : 508 - 517
[22] Single-channel speech enhancement using colored spectrograms
Gul, Sania
Khan, Muhammad Salman
Fazeel, Muhammad
COMPUTER SPEECH AND LANGUAGE, 2024, 86
[23] Comparative Studies of Single-Channel Speech Enhancement Techniques
Kumar, Bittu
Kumar, Neeraj
Kumar, Manoj
Prasad, S. V. S.
Varma, Ashwini Kumar
Ravi, Banoth
IETE JOURNAL OF RESEARCH, 2024, 70 (06) : 5704 - 5720
[24] Single-Channel Speech Enhancement Using Double Spectrum
Blass, Martin
Mowlaee, Pejman
Kleijn, W. Bastiaan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1740 - 1744
[25] A spectral conversion approach to single-channel speech enhancement
Mouchtaris, Athanasios
Van der Spiegel, Jan
Mueller, Paul
Tsakalides, Panagiotis
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1180 - 1193
[26] UltraSE: Single-Channel Speech Enhancement Using Ultrasound
Sun, Ke
Zhang, Xinyu
PROCEEDINGS OF THE 27TH ACM ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING (ACM MOBICOM '21), 2021, : 160 - 173
[27] Phase-Aware Single-channel Speech Enhancement
Mowlaee, Pejman
Watanabe, Mario Kaoru
Saeidi, Rahim
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
[28] Smartphone-based single-channel speech enhancement application for hearing aids
Shankar, Nikhil
Bhat, Gautam Shreedhar
Panahi, Issa M. S.
Tittle, Stephanie
Thibodeau, Linda M.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 150 (03): : 1663 - 1673
[29] Single-Channel Speech Enhancement With Phase Reconstruction Based on Phase Distortion Averaging
Wakabayashi, Yukoh
Fukumori, Takahiro
Nakayama, Masato
Nishiura, Takanobu
Yamashita, Yoichi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1559 - 1569
[30] SINGLE-CHANNEL ENHANCEMENT OF CONVOLUTIVE NOISY SPEECH BASED ON A DISCRIMINATIVE NMF ALGORITHM
Chung, Hanwook
Plourde, Eric
Champagne, Benoit
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2302 - 2306

← 1 2 3 4 5 →