Long short-term memory for speaker generalization in supervised speech separation

被引:186
|
作者
Chen, Jitong [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
来源
关键词
NEURAL-NETWORKS; ALGORITHM; INTELLIGIBILITY; NOISE; MASKS;
D O I
10.1121/1.4986931
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker-and noise-independent speech separation. (C) 2017 Acoustical Society of America.
引用
收藏
页码:4705 / 4714
页数:10
相关论文
共 50 条
  • [31] Detecting Overlapping Speech with Long Short-Term Memory Recurrent Neural Networks
    Geiger, Juergen T.
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1667 - 1671
  • [32] Short-term Load Forecasting with Distributed Long Short-Term Memory
    Dong, Yi
    Chen, Yang
    Zhao, Xingyu
    Huang, Xiaowei
    2023 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE, ISGT, 2023,
  • [33] A short-term prediction model of global ionospheric VTEC based on the combination of long short-term memory and convolutional long short-term memory
    Peng Chen
    Rong Wang
    Yibin Yao
    Hao Chen
    Zhihao Wang
    Zhiyuan An
    Journal of Geodesy, 2023, 97
  • [34] A short-term prediction model of global ionospheric VTEC based on the combination of long short-term memory and convolutional long short-term memory
    Chen, Peng
    Wang, Rong
    Yao, Yibin
    Chen, Hao
    Wang, Zhihao
    An, Zhiyuan
    JOURNAL OF GEODESY, 2023, 97 (05)
  • [35] QUANTUM LONG SHORT-TERM MEMORY
    Chen, Samuel Yen-Chi
    Yoo, Shinjae
    Fang, Yao-Lung L.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8622 - 8626
  • [36] LIPREADING WITH LONG SHORT-TERM MEMORY
    Wand, Michael
    Koutnik, Jan
    Schmidhuber, Jurgen
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6115 - 6119
  • [37] Associative Long Short-Term Memory
    Danihelka, Ivo
    Wayne, Greg
    Uria, Benigno
    Kalchbrenner, Nal
    Graves, Alex
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [38] Nonlinear Dynamic Soft Sensor Modeling With Supervised Long Short-Term Memory Network
    Yuan, Xiaofeng
    Li, Lin
    Wang, Yalin
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (05) : 3168 - 3176
  • [39] Enhanced Deep Hierarchical Long Short-Term Memory and Bidirectional Long Short-Term Memory for Tamil Emotional Speech Recognition using Data Augmentation and Spatial Features
    Fernandes, Bennilo
    Mannepalli, Kasiprasad
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (04): : 2967 - 2992
  • [40] Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
    Yin, Ruiqing
    Bredin, Herve
    Barras, Claude
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3827 - 3831