Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks

被引:27
|
作者
Wang, Zhong-Qiu [1 ]
Zhang, Xueliang [3 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
[3] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China
关键词
GCC-PHAT; time-frequency masking; robust TDOA estimation; deep neural networks; NOISE; LOCALIZATION; RECOGNITION; SEPARATION; ALGORITHM;
D O I
10.21437/Interspeech.2018-1652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning based time-frequency (T-F) masking has dramatically advanced monaural speech separation and enhancement. This study investigates its potential for robust time difference of arrival (TDOA) estimation in noisy and reverberant environments. Three novel algorithms are proposed to improve the robustness of conventional cross-correlation-, beam-forming- and subspace-based algorithms for speaker localization. The key idea is to leverage the power of deep neural networks (DNN) to accurately identify T-F units that are relatively clean for TDOA estimation. All of the proposed algorithms exhibit strong robustness for TDOA estimation in environments with low input SNR, high reverberation and low direction-to-reverberant energy ratio.
引用
收藏
页码:322 / 326
页数:5
相关论文
共 50 条
  • [1] Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
    Zhang, Wangyou
    Zhou, Ying
    Qian, Yanmin
    INTERSPEECH 2019, 2019, : 2703 - 2707
  • [2] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [3] TIME DIFFERENCE OF ARRIVAL ESTIMATION OF SPEECH SIGNALS USING DEEP NEURAL NETWORKS WITH INTEGRATED TIME-FREQUENCY MASKING
    Pertila, Pasi
    Parviainen, Mikko
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 436 - 440
  • [4] Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking
    Wang, Zhong-Qiu
    Zhang, Xueliang
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 178 - 188
  • [5] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yang Yu
    Wenwu Wang
    Peng Han
    EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [6] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yu, Yang
    Wang, Wenwu
    Han, Peng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [7] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [8] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [9] NEURAL NETWORK BASED TIME-FREQUENCY MASKING AND STEERING VECTOR ESTIMATION FOR TWO-CHANNEL MVDR BEAMFORMING
    Liu, Yuzhou
    Ganguly, Anshuman
    Kamath, Krishna
    Kristjansson, Trausti
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6717 - 6721
  • [10] Howling Noise Cancellation in Time-Frequency Domain by Deep Neural Networks
    Gan, Huaguo
    Luo, Gaoyong
    Luo, Yaqing
    Luo, Wenbin
    PROCEEDINGS OF SIXTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICT 2021), VOL 2, 2022, 236 : 319 - 332