MONAURAL SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS BY MAXIMIZING A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE

被引:0
|
作者
Kolbaek, Morten [1 ]
Tan, Zheng-Hua [1 ]
Jensen, Jesper [1 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
关键词
Speech Enhancement; Deep Neural Networks; Speech Intelligibility; Speech Denoising; Deep Learning; SPECTRAL AMPLITUDE ESTIMATOR; HEARING-IMPAIRED LISTENERS; ALGORITHM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together with gradient based optimization techniques. We show through simulation experiments that the proposed SE system achieves large improvements in estimated speech intelligibility, when tested on matched and unmatched natural noise types, at multiple signal-to-noise ratios. Furthermore, we show that the SE system, when trained using an approximate-STOI cost function performs on par with a system trained with a mean square error cost applied to short-time temporal envelopes. Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility. These results are important because they suggest that traditional DNN based STSA SE systems might be optimal in terms of estimated speech intelligibility.
引用
收藏
页码:5059 / 5063
页数:5
相关论文
共 50 条
  • [11] Predicting speech intelligibility with deep neural networks
    Spille, Constantin
    Ewert, Stephan D.
    Kollmeier, Birger
    Meyer, Bernd T.
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 51 - 66
  • [12] SHORT-TIME OBJECTIVE ASSESSMENT OF SPEECH QUALITY
    Sharma, Dushyant
    Naylor, Patrick A.
    Gaubitch, Nikolay D.
    Brookes, Mike
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 471 - 475
  • [13] Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data
    Payton, K.L. (kpayton@umassd.edu), 1600, Acoustical Society of America (134):
  • [14] Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data
    Payton, Karen L.
    Shrestha, Mona
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (05): : 3818 - 3827
  • [15] Joint Optimization of Modified Ideal Radio Mask and Deep Neural Networks for Monaural Speech Enhancement
    Han, Wei
    Wu, Congming
    Zhang, Xiongwei
    Zhang, Qiye
    Bai, Songting
    2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1070 - 1074
  • [16] SPEECH ENHANCEMENT USING MULTIPLE DEEP NEURAL NETWORKS
    Karjol, Pavan
    Kumar, Ajay M.
    Ghosh, Prasanta Kumar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5049 - 5053
  • [17] DISCRIMINATIVE DEEP RECURRENT NEURAL NETWORKS FOR MONAURAL SPEECH SEPARATION
    Wang, Guan-Xiang
    Hsu, Chung-Chien
    Chien, Jen-Tzung
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2544 - 2548
  • [18] HELIUM SPEECH ENHANCEMENT USING THE SHORT-TIME FOURIER-TRANSFORM
    RICHARDS, MA
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1982, 30 (06): : 841 - 853
  • [19] Monaural speech enhancement combining deep neural network and convex optimization
    ZHANG Xiaoyan
    ZHANG Tianqi
    GE Wanying
    BAI Yangliu
    ChineseJournalofAcoustics, 2021, 40 (03) : 460 - 476
  • [20] Monaural speech enhancement combining deep neural network and convex optimation
    Zhang, Xiaoyan
    Zhang, Tianqi
    Ge, Wanying
    Bai, Yangliu
    Shengxue Xuebao/Acta Acustica, 2021, 46 (03): : 471 - 480