DNN TRAINING BASED ON CLASSIC GAIN FUNCTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

被引：0

作者：

Tu, Yan-Hui ^{[1
]}

Du, Jun ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

statistical speech enhancement; ideal ratio mask; deep learning; gain function; speech recognition; NOISE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For conventional single-channel speech enhancement based on noise power spectrum, the speech gain function, which suppresses background noise at each time-frequency bin, is calculated by prior signal-to-noise-ratio (SNR). Hence, accurate prior SNR estimation is paramount for successful noise suppression. Accordingly, we have proposed a single-channel approach to combine conventional and deep learning techniques for speech enhancement and automatic speech recognition (ASR) recently. However, the combination process is at the testing stage, which is time-consuming with a complicated procedure. In this study, the gain function of classic speech enhancement will be utilized to optimize the ideal ratio mask based deep neural network (DNN-IRM) at the training stage, denoted as GF-DNN-IRM. And at the testing stage, the estimated IRM by GF-DNN-IRM model is directly used to generate enhanced speech without involving the conventional speech enhancement process. In addition, DNNs with less parameters in the causal processing mode are also discussed. Experiments of the CHiME-4 challenge task show that our proposed algorithm can achieve a relative word error rate reduction of 6.57% on RealData test set comparing to unprocessed speech without acoustic model retraining in causal mode, while the traditional DNN-IRM method fails to improve ASR performance in this case.

引用

页码：910 / 914

页数：5

共 50 条

[1] SINGLE-CHANNEL SPEECH ENHANCEMENT WITH SEQUENTIALLY TRAINED DNN SYSTEM
Sun, Yang
Xian, Yang
Wang, Wenwu
Naqvi, Syed Mohsen
2019 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2019,
[2] Single-Channel Speech Enhancement Techniques for Distant Speech Recognition
Ashwini, Jaya
Kumaraswamy, Ramaswamy
JOURNAL OF INTELLIGENT SYSTEMS, 2013, 22 (02) : 81 - 93
[3] INVESTIGATION OF A PARAMETRIC GAIN APPROACH TO SINGLE-CHANNEL SPEECH ENHANCEMENT
Huang, Gongping
Chen, Jingdong
Benesty, Jacob
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 206 - 210
[4] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
Taherian, Hassan
Wang, Zhong-Qiu
Chang, Jorge
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
[5] Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement
Han, Wei
Zhang, Xiongwei
Min, Gang
Zhou, Xingyu
Sun, Meng
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (02) : 714 - 717
[6] A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target
Wang, Lei
Zhu, Jie
Sun, Kangbo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (11) : 1963 - 1970
[7] Single-Channel Speech Enhancement Based on Psychoacoustic Masking
Zhou, Tingting
Zeng, Yumin
Wang, Rongrong
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2017, 65 (04): : 272 - 284
[8] Single-Channel Multitalker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) : 66 - 80
[9] Weak Speech Recovery for Single-Channel Speech Enhancement
Wong, Arthur
Ming, Kok
Low, Siow Yong
2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS), VOLS 1-2, 2012, : 627 - 631
[10] Single-channel speech enhancement based on frequency domain ALE
Nakanishi, Isao
Nagata, Yuudai
Itoh, Yoshio
Fukui, Yutaka
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2541 - 2544

← 1 2 3 4 5 →