Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability

被引:1
|
作者
Yang, Chouchang [1 ]
Saidutta, Yashas Malur [1 ]
Srinivasa, Rakshith Sharma [1 ]
Lee, Ching-Hua [1 ]
Shen, Yilin [1 ]
Jin, Hongxia [1 ]
机构
[1] Samsung Res Amer, Mountain View, CA 94043 USA
来源
关键词
keyword spotting; speech commands; speech presence probability; noise robust; speech enhancement;
D O I
10.21437/Interspeech.2023-2222
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although various deep keyword spotting (KWS) systems have demonstrated promising performance under relatively noiseless environments, accurate keyword detection in the presence of strong noise remains challenging. Room acoustics and noise conditions can be highly diverse, leading to drastic performance degradation if not handled carefully. In this paper, we propose a noise management front-end called SE-SPP Net performing speech enhancement (SE) and speech presence probability (SPP) estimation jointly for robust KWS in noise. The SE-SPP Net estimates both the denoised Mel spectrogram and the position of the speech utterance in the noisy signal, where the latter is estimated as the probability of a particular time-frequency bin containing speech. Further, it comes at relatively no cost in model size when compared to a model estimating the denoised speech. Our SE-SPP Net can improve noisy KWS performance by up to 7% compared to a similar sized state-of-the-art model at SNR -10dB.
引用
收藏
页码:1638 / 1642
页数:5
相关论文
共 50 条
  • [21] Speech Keyword Spotting with Rule Based Segmentation
    Greibus, Mindaugas
    Telksnys, Laimutis
    INFORMATION AND SOFTWARE TECHNOLOGIES (ICIST 2013), 2013, 403 : 186 - 197
  • [22] Baseline for Keyword Spotting in Latvian Broadcast Speech
    Dargis, Roberts
    Znotins, Arturs
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 75 - 82
  • [23] Realizing Speech to Gesture Conversion by Keyword Spotting
    Zhao, Na
    Yang, Hongwu
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [24] Binary Speech Features for Keyword Spotting Tasks
    Riviello, Alexandre
    David, Jean-Pierre
    INTERSPEECH 2019, 2019, : 3460 - 3464
  • [25] Unsupervised Speech Enhancement Using Optimal Transport and Speech Presence Probability
    Jiang, Wenbin
    Yu, Kai
    Wen, Fei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4445 - 4455
  • [26] SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS AS A FRONT END FOR ROBUST SPEECH RECOGNISER
    Lena, D. S. K.
    Vijayalakshmi, P.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 430 - 435
  • [27] Robust estimators for speech enhancement in real environments
    Sandoval-Ibarra, Yuma
    Diaz-Ramirez, Victor H.
    Kober, Vitaly
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING IX, 2015, 9598
  • [28] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
    Dong, Huan-Yu
    Lee, Chang-Myung
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [29] Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering
    Huan-Yu Dong
    Chang-Myung Lee
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [30] Complex laplacian probability density function for noisy speech enhancement
    Chang, Joon-Hyuk
    IEICE ELECTRONICS EXPRESS, 2007, 4 (08): : 245 - 250