Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition

被引:0
|
作者
Adiga, Aniruddha [1 ]
Magimai-Doss, Mathew [2 ]
Seelamantula, Chandra Sekhar [1 ]
机构
[1] Indian Inst Sci, Dept Elect Engn, Bangalore 560012, Karnataka, India
[2] diap Res Inst, Martigny, Switzerland
基金
瑞士国家科学基金会;
关键词
Gammatone wavelets; Auditory modeling; Cepstral coefficients; Speech recognition; REPRESENTATIONS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We develop noise robust features using Gammatone wavelets derived from the popular Gammatone functions. These wavelets incorporate the characteristics of human peripheral auditory systems, in particular the spatially-varying frequency response of the basilar membrane. We refer to the new features as Gammatone Wavelet Cepstral Coefficients (GWCC). The procedure involved in extracting GWCC from a speech signal is similar to that of the conventional Mel-Frequency Cepstral Coefficients (MFCC) technique, with the difference being in the type of filterbank used. We replace the conventional mel filterbank in MFCC with a Gammatone wavelet filterbank, which we construct using Gammatone wavelets. We also explore the effect of Gammatone filterbank based features (Gammatone Cepstral Coefficients (GCC)) for robust speech recognition. On AURORA 2 database, a comparison of GWCCs and GCCs with MFCCs shows that Gammatone based features yield a better recognition performance at low SNRs.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Chip design of mel frequency cepstral coefficients for speech recognition
    Wang, JC
    Wang, JF
    Weng, YS
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3658 - 3661
  • [32] Recognition of emotion from speech using evolutionary cepstral coefficients
    Bakhshi, Ali
    Chalup, Stephan
    Harimi, Ali
    Mirhassani, Seyed Mostafa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) : 35739 - 35759
  • [33] Recognition of emotion from speech using evolutionary cepstral coefficients
    Ali Bakhshi
    Stephan Chalup
    Ali Harimi
    Seyed Mostafa Mirhassani
    Multimedia Tools and Applications, 2020, 79 : 35739 - 35759
  • [34] Data-driven Rescaled Teager Energy Cepstral Coefficients for Noise-robust Speech Recognition
    Hsu, Miau-Luan
    Chen, Chia-Ping
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [35] Perceptual harmonic cepstral coefficients for speech recognition in noisy environment
    Gu, L
    Rose, K
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 125 - 128
  • [36] Robust Underwater Target Recognition Using Auditory Cepstral Coefficients
    Wu, Yaozhen
    Yang, Yixin
    Tao, Can
    Tian, Feng
    Yang, Long
    OCEANS 2014 - TAIPEI, 2014,
  • [37] Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification
    Missaoui, Ibrahim
    Lachiri, Zied
    APPLIED ACOUSTICS, 2025, 231
  • [38] Cepstral amplitude range normalization for noise robust speech recognition
    Yoshizawa, S
    Hayasaka, N
    Wada, N
    Miyanaga, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (08): : 2130 - 2137
  • [39] Bounded cepstral marginalization of missing data for robust speech recognition
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 1 - 23
  • [40] CEPSTRAL DOMAIN TALKER STRESS COMPENSATION FOR ROBUST SPEECH RECOGNITION
    CHEN, YN
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (04): : 433 - 439