Long short-term memory for speaker generalization in supervised speech separation

被引:186
|
作者
Chen, Jitong [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
关键词
NEURAL-NETWORKS; ALGORITHM; INTELLIGIBILITY; NOISE; MASKS;
D O I
10.1121/1.4986931
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker-and noise-independent speech separation. (C) 2017 Acoustical Society of America.
引用
收藏
页码:4705 / 4714
页数:10
相关论文
共 50 条
  • [41] Short-Term Load Forecasting using A Long Short-Term Memory Network
    Liu, Chang
    Jin, Zhijian
    Gu, Jie
    Qiu, Caiming
    2017 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE EUROPE (ISGT-EUROPE), 2017,
  • [42] VERY BRIEF SHORT-TERM MEMORY IN SPEECH PERCEPTION
    PISONI, DB
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 51 (01): : 79 - &
  • [43] SHORT-TERM MEMORY IN DEAF - TEST FOR SPEECH CODING
    CONRAD, R
    BRITISH JOURNAL OF PSYCHOLOGY, 1972, 63 (MAY) : 173 - +
  • [44] INTERFERENCE IN SHORT-TERM AND LONG-TERM MEMORY
    BARTZ, WH
    SALEHI, M
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1970, 84 (02): : 380 - &
  • [45] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [46] Speech Inpainting Based on Multi-Layer Long Short-Term Memory Networks
    Shi, Haohan
    Shi, Xiyu
    Dogan, Safak
    FUTURE INTERNET, 2024, 16 (02)
  • [47] Language Modeling Using Part-of-speech and Long Short-Term Memory Networks
    Norouzi, Sanaz Saki
    Akbari, Ahmad
    Nasersharif, Babak
    2019 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2019), 2019, : 182 - 187
  • [48] LOMBARD SPEECH SYNTHESIS USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
    Bollepalli, Bajibabu
    Airaksinen, Manu
    Alku, Paavo
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5505 - 5509
  • [49] Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition
    Putri, Fanda Yuliana
    Lestari, Dessi Puji
    Widyantoro, Dwi Hendratmo
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 44 - 48
  • [50] An attention Long Short-Term Memory based system for automatic classification of speech intelligibility
    Fernandez-Diaz, Miguel
    Gallardo-Antolin, Ascension
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96