MONAURAL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS BY MAXIMIZING A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE

被引:0
|
作者
Kolbaek, Morten [1 ]
Tan, Zheng-Hua [1 ]
Jensen, Jesper [1 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
关键词
Speech Enhancement; Deep Neural Networks; Speech Intelligibility; Speech Denoising; Deep Learning; SPECTRAL AMPLITUDE ESTIMATOR; HEARING-IMPAIRED LISTENERS; ALGORITHM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together with gradient based optimization techniques. We show through simulation experiments that the proposed SE system achieves large improvements in estimated speech intelligibility, when tested on matched and unmatched natural noise types, at multiple signal-to-noise ratios. Furthermore, we show that the SE system, when trained using an approximate-STOI cost function performs on par with a system trained with a mean square error cost applied to short-time temporal envelopes. Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility. These results are important because they suggest that traditional DNN based STSA SE systems might be optimal in terms of estimated speech intelligibility.
引用
收藏
页码:5059 / 5063
页数:5
相关论文
共 50 条
  • [21] PERCEPTUALLY GUIDED SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
    Zhao, Yan
    Xu, Buye
    Giri, Ritwik
    Zhang, Tao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5074 - 5078
  • [22] Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech
    Sahu, Laxmi Priya
    Pradhan, Gayadhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (10) : 5676 - 5698
  • [23] Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech
    Laxmi Priya Sahu
    Gayadhar Pradhan
    Circuits, Systems, and Signal Processing, 2022, 41 : 5676 - 5698
  • [24] The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking
    Biberger, Thomas
    Ewert, Stephan D.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (02): : 1098 - 1111
  • [25] Speech enhancement using Kalman filters for restoration of short-time DFT trajectories
    Zavarehei, E
    Vaseghi, S
    Yan, Q
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 313 - 318
  • [26] Subjective intelligibility of deep neural network-based speech enhancement
    Gelderblom, Femke B.
    Tronstad, Tron V.
    Viggen, Erlend Magnus
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1968 - 1972
  • [27] Monaural speech enhancement combining accurate ratio mask and deep neural network
    BAI Haojun
    ZHANG Tianqi
    LIU Jianxing
    YE Shaopeng
    Chinese Journal of Acoustics, 2022, 41 (04) : 373 - 389
  • [28] On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks
    Gelderblom, Femke B.
    Tronstad, Tron Vedul
    Svendsen, Torbjorn
    Myrvoll, Tor Andre
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 215 - 226
  • [29] Speech Intelligibility Enhancement in Strong Mechanical Noise Based on Neural Networks
    Cheng, Feng
    Wang, Xiaochen
    Gang, Li
    Tu, Weiping
    Wang, Jinshan
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 702 - 712
  • [30] Adaptive short-time analysis-synthesis for speech enhancement
    Rudoy, Daniel
    Basu, Prabahan
    Quatieri, Thomas E.
    Dunn, Bob
    Wolfe, Patrick J.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4905 - +