Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks

被引:27
|
作者
Wang, Zhong-Qiu [1 ]
Zhang, Xueliang [3 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
[3] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China
关键词
GCC-PHAT; time-frequency masking; robust TDOA estimation; deep neural networks; NOISE; LOCALIZATION; RECOGNITION; SEPARATION; ALGORITHM;
D O I
10.21437/Interspeech.2018-1652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning based time-frequency (T-F) masking has dramatically advanced monaural speech separation and enhancement. This study investigates its potential for robust time difference of arrival (TDOA) estimation in noisy and reverberant environments. Three novel algorithms are proposed to improve the robustness of conventional cross-correlation-, beam-forming- and subspace-based algorithms for speaker localization. The key idea is to leverage the power of deep neural networks (DNN) to accurately identify T-F units that are relatively clean for TDOA estimation. All of the proposed algorithms exhibit strong robustness for TDOA estimation in environments with low input SNR, high reverberation and low direction-to-reverberant energy ratio.
引用
收藏
页码:322 / 326
页数:5
相关论文
共 50 条
  • [21] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [22] Robust Time-Frequency Analysis Based on the L-Estimation and Compressive Sensing
    Stankovic, L.
    Stankovic, S.
    Orovic, I.
    Amin, Moeness G.
    IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (05) : 499 - 502
  • [23] Impact of phase estimation on single-channel speech separation based on time-frequency masking
    Mayer, Florian
    Williamson, Donald S.
    Mowlaee, Pejman
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4668 - 4679
  • [24] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [25] Label-Driven Time-Frequency Masking for Robust Speech Command Recognition
    Soni, Meet
    Sheikh, Imran
    Kopparapu, Sunil Kumar
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 341 - 351
  • [26] Robust digit recognition using phase-dependent time-frequency masking
    Shi, GJ
    Aarabi, P
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 684 - 687
  • [27] Multiple Classification of Gait Using Time-Frequency Representations and Deep Convolutional Neural Networks
    Jung, Dawoon
    Nguyen, Mau Dung
    Park, Mina
    Kim, Jinwook
    Mun, Kyung-Ryoul
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (04) : 997 - 1005
  • [28] Robust digit recognition using phase-dependent time-frequency masking
    Shi, GJ
    Aarabi, P
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 629 - 632
  • [29] Bridge Damage Identification Using Deep Neural Networks on Time-Frequency Signals Representation
    Santaniello, Pasquale
    Russo, Paolo
    SENSORS, 2023, 23 (13)
  • [30] A robust image watermarking based on time-frequency
    Oeztuerk, Mahmut
    Akan, Aydin
    Cekic, Yalcin
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 362 - +