Long short-term memory for speaker generalization in supervised speech separation

被引:186
|
作者
Chen, Jitong [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
关键词
NEURAL-NETWORKS; ALGORITHM; INTELLIGIBILITY; NOISE; MASKS;
D O I
10.1121/1.4986931
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker-and noise-independent speech separation. (C) 2017 Acoustical Society of America.
引用
收藏
页码:4705 / 4714
页数:10
相关论文
共 50 条
  • [21] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
    Hsu, Wei-Ning
    Zhang, Yu
    Glass, James
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
  • [22] Bearing fault diagnosis using weakly supervised long short-term memory
    Miki, Daisuke
    Demachi, Kazuyuki
    JOURNAL OF NUCLEAR SCIENCE AND TECHNOLOGY, 2020, 57 (09) : 1091 - 1100
  • [23] Semi-supervised long short-term memory for human action recognition
    Liu, Hong
    Liu, Chang
    Ding, Runwei
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 373 - 378
  • [24] Supervised single-channel dual domains speech enhancement technique using bidirectional long short-term memory
    Md. Shakhawat Hosen
    Samiul Basir
    Md. Farukuzzaman Khan
    A.O.M Asaduzzaman
    Md. Mojahidul Islam
    Md Shohidul Islam
    Multimedia Tools and Applications, 2025, 84 (5) : 2779 - 2803
  • [25] Speaker Identification Based on Multimodal Long Short-Term Memory with Depth-Gate
    Chen Huangkang
    Chen Ying
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (03)
  • [26] ROLE OF SPEECH RESPONSES IN SHORT-TERM MEMORY
    MURRAY, DJ
    CANADIAN JOURNAL OF PSYCHOLOGY, 1967, 21 (03): : 263 - 263
  • [27] Diagnosis of pathological speech with streamlined features for long short-term memory learning
    Pham, Tuan D.
    Holmes, Simon B.
    Zou, Lifong
    Patel, Mangala
    Coulthard, Paul
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170
  • [28] Emotion Recognition From Speech and Text using Long Short-Term Memory
    Venkateswarlu, Sonagiri China
    Jeevakala, Siva Ramakrishna
    Kumar, Naluguru Udaya
    Munaswamy, Pidugu
    Pendyala, Dhanalaxmi
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (04) : 11166 - 11169
  • [29] Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory
    Lasiman, Jeremia Jason
    Lestari, Dessi Puji
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 40 - 43
  • [30] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Oruh, Jane
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2022, 10 : 30069 - 30079