MONAURAL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS BY MAXIMIZING A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE

被引:0
|
作者
Kolbaek, Morten [1 ]
Tan, Zheng-Hua [1 ]
Jensen, Jesper [1 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
关键词
Speech Enhancement; Deep Neural Networks; Speech Intelligibility; Speech Denoising; Deep Learning; SPECTRAL AMPLITUDE ESTIMATOR; HEARING-IMPAIRED LISTENERS; ALGORITHM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together with gradient based optimization techniques. We show through simulation experiments that the proposed SE system achieves large improvements in estimated speech intelligibility, when tested on matched and unmatched natural noise types, at multiple signal-to-noise ratios. Furthermore, we show that the SE system, when trained using an approximate-STOI cost function performs on par with a system trained with a mean square error cost applied to short-time temporal envelopes. Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility. These results are important because they suggest that traditional DNN based STSA SE systems might be optimal in terms of estimated speech intelligibility.
引用
收藏
页码:5059 / 5063
页数:5
相关论文
共 50 条
  • [1] PATHOLOGICAL SPEECH INTELLIGIBILITY ASSESSMENT BASED ON THE SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE
    Janbakhshi, Parvaneh
    Kodrasi, Ina
    Bourlard, Herve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6405 - 6409
  • [2] A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE FOR TIME-FREQUENCY WEIGHTED NOISY SPEECH
    Taal, Cees H.
    Hendriks, Richard C.
    Heusdens, Richard
    Jensen, Jesper
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4214 - 4217
  • [3] A NON-INTRUSIVE SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE
    Andersen, Asger Heidemann
    de Haan, Jan Mark
    Tan, Zheng-Hua
    Jensen, Jesper
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5085 - 5089
  • [4] On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement
    Kolbaek, Morten
    Tan, Zheng-Hua
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (02) : 283 - 295
  • [5] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [6] On the use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure
    Andersen, Asger Heidemann
    de Haan, Jan Mark
    Tan, Zheng-Hua
    Jensen, Jesper
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2963 - 2967
  • [7] A Binaural Short Time Objective Intelligibility Measure for Noisy and Enhanced Speech
    Andersen, Asger Heidemann
    de Haan, Jan Mark
    Tani, Zheng-Hua
    Jensen, Jesper
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2563 - 2567
  • [8] Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement
    Wu, Haibin
    Tan, Ke
    Xu, Buye
    Kumar, Anurag
    Wong, Daniel
    INTERSPEECH 2023, 2023, : 3889 - 3893
  • [9] Binaural Speech Intelligibility Estimation Using Deep Neural Networks
    Kondo, Kazuhiro
    Taira, Kazuya
    Kobayashi, Yosuke
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1858 - 1862
  • [10] Listening difficulty estimation model using short-time objective intelligibility measure for outdoor public address systems
    Noguchi, Keita
    Kobayashi, Yosuke
    Kishigami, Jay
    Kurisu, Kiyohiro
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (01) : 420 - 422