MONAURAL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS BY MAXIMIZING A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE

被引：0

作者：

Kolbaek, Morten ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Jensen, Jesper ^{[1
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Speech Enhancement; Deep Neural Networks; Speech Intelligibility; Speech Denoising; Deep Learning; SPECTRAL AMPLITUDE ESTIMATOR; HEARING-IMPAIRED LISTENERS; ALGORITHM;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together with gradient based optimization techniques. We show through simulation experiments that the proposed SE system achieves large improvements in estimated speech intelligibility, when tested on matched and unmatched natural noise types, at multiple signal-to-noise ratios. Furthermore, we show that the SE system, when trained using an approximate-STOI cost function performs on par with a system trained with a mean square error cost applied to short-time temporal envelopes. Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility. These results are important because they suggest that traditional DNN based STSA SE systems might be optimal in terms of estimated speech intelligibility.

引用

页码：5059 / 5063

页数：5

共 50 条

[21] PERCEPTUALLY GUIDED SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
Zhao, Yan
Xu, Buye
Giri, Ritwik
Zhang, Tao
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5074 - 5078
[22] Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech
Sahu, Laxmi Priya
Pradhan, Gayadhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (10) : 5676 - 5698
[23] Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech
Laxmi Priya Sahu
Gayadhar Pradhan
Circuits, Systems, and Signal Processing, 2022, 41 : 5676 - 5698
[24] The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking
Biberger, Thomas
Ewert, Stephan D.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (02): : 1098 - 1111
[25] Speech enhancement using Kalman filters for restoration of short-time DFT trajectories
Zavarehei, E
Vaseghi, S
Yan, Q
2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 313 - 318
[26] Subjective intelligibility of deep neural network-based speech enhancement
Gelderblom, Femke B.
Tronstad, Tron V.
Viggen, Erlend Magnus
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1968 - 1972
[27] Monaural speech enhancement combining accurate ratio mask and deep neural network
BAI Haojun
ZHANG Tianqi
LIU Jianxing
YE Shaopeng
Chinese Journal of Acoustics, 2022, 41 (04) : 373 - 389
[28] On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks
Gelderblom, Femke B.
Tronstad, Tron Vedul
Svendsen, Torbjorn
Myrvoll, Tor Andre
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 215 - 226
[29] Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks
Cheng, Feng
Wang, Xiaochen
Gang, Li
Tu, Weiping
Wang, Jinshan
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 702 - 712
[30] Adaptive short-time analysis-synthesis for speech enhancement
Rudoy, Daniel
Basu, Prabahan
Quatieri, Thomas E.
Dunn, Bob
Wolfe, Patrick J.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4905 - +

← 1 2 3 4 5 →