Deep Scattering Power Spectrum Features for Robust Speech Recognition

被引:7
|
作者
Joy, Neethu M. [1 ]
Oglic, Dino [1 ]
Cvetkovic, Zoran [1 ]
Bell, Peter [2 ]
Renals, Steve [2 ]
机构
[1] Kings Coll London, Dept Engn, London, England
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
基金
英国工程与自然科学研究理事会;
关键词
scattering coefficients; wavelet transform; robustness; deep scattering spectrum; power spectrum;
D O I
10.21437/Interspeech.2020-2656
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep scattering spectrum consists of a cascade of wavelet transforms and modulus non-linearity. It generates features of different orders, with the first order coefficients approximately equal to the Mel-frequency cepstrum, and higher order coefficients recovering information lost at lower levels. We investigate the effect of including the information recovered by higher order coefficients on the robustness of speech recognition. To that end, we also propose a modification to the original scattering transform tailored for noisy speech. In particular, instead of the modulus non-linearity we opt to work with power coefficients and, therefore, use the squared modulus non-linearity. We quantify the robustness of scattering features using the word error rates of acoustic models trained on clean speech and evaluated using sets of utterances corrupted with different noise types. Our empirical results show that the second order scattering power spectrum coefficients capture invariants relevant for noise robustness and that this additional information improves generalization to unseen noise conditions (almost 20% relative error reduction on AURORA4). This finding can have important consequences on speech recognition systems that typically discard the second order information and keep only the first order features (known for emulating MFCC and FBANK values) when representing speech.
引用
收藏
页码:1673 / 1677
页数:5
相关论文
共 50 条
  • [1] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Hung, Jeih-weih
    Fan, Hao-teng
    Tu, Wen-hsiang
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [2] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Jeih-weih Hung
    Hao-teng Fan
    Wen-hsiang Tu
    EURASIP Journal on Advances in Signal Processing, 2012
  • [3] DEEP CONVOLUTIONAL NETS AND ROBUST FEATURES FOR REVERBERATION-ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Wang, Wen
    Franco, Horacio
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 548 - 553
  • [4] HUMAN ACTION RECOGNITION USING ROBUST POWER SPECTRUM FEATURES
    Ragheb, Hossein
    Velastin, Sergio
    Remagnino, Paolo
    Ellis, Tim
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 753 - 756
  • [5] Cepstrum derived from differentiated power spectrum for robust speech recognition
    Chen, JD
    Paliwal, KK
    Nakamura, S
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 469 - 484
  • [6] Noise-robust speech recognition based on difference of power spectrum
    Xu, JF
    Wei, G
    ELECTRONICS LETTERS, 2000, 36 (14) : 1247 - 1248
  • [7] Modulation Spectrum Power-law Expansion for Robust Speech Recognition
    Fan, Hao-Teng
    Ye, Zi-Hao
    Hung, Jeih-Weih
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [8] Noise Robust Speech Features for Automatic Continuous Speech Recognition using Running Spectrum Analysis
    Ohnuki, Kazunaga
    Takahashi, Wataru
    Yoshizawa, Shingo
    Miyanaga, Yoshikazu
    2008 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, 2008, : 150 - 153
  • [9] New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition
    Seyedin, Sanaz
    Ahadi, Seyed Mohammad
    Gazor, Saeed
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [10] Normalizing the speech modulation spectrum for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1021 - +