Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

被引:9
|
作者
Morales-Cordovilla, Juan A. [1 ]
Peinado, Antonio M. [1 ]
Sanchez, Victoria [1 ]
Gonzalez, Jose A. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 03期
关键词
Acoustic noise; autocorrelation-based mel frequency cepstral coefficient (AMFCC); autocorrelation estimation; pitch-synchronous analysis; robust speech recognition;
D O I
10.1109/TASL.2010.2053846
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.
引用
收藏
页码:640 / 651
页数:12
相关论文
共 50 条
  • [11] MVDR based feature extraction for robust speech recognition
    Dharanipragada, S
    Rao, BD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 309 - 312
  • [12] Pitch-synchronous speech signal segmentation and its applications
    Petrushin, VA
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 321 - 326
  • [13] Feature extraction for robust speech recognition
    Dharanipragada, S
    2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
  • [14] PITCH-SYNCHRONOUS WAVELET REPRESENTATIONS OF SPEECH AND MUSIC SIGNALS
    EVANGELISTA, G
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (12) : 3313 - 3330
  • [15] A Naxi speech synthesis system based on Pitch-Synchronous Overlap-Add
    Yang, J
    Pu, YY
    Liu, B
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 654 - 657
  • [16] A pitch synchronous feature extraction method for speaker recognition
    Kim, S
    Eriksson, T
    Kang, HG
    Youn, DH
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 405 - 408
  • [17] Pitch synchronous based feature extraction for noise-robust speaker verification
    Gong Wei-Guo
    Yang Li-Ping
    Chen Di
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 295 - 298
  • [18] Feature extraction based on auditory representations for robust speech recognition
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 15 - 16
  • [19] Naxi speech synthesis system based on pitch-synchronous overlap-add
    Yang, Jian
    Pu, Yuanyuan
    Liu, Bing
    International Conference on Signal Processing Proceedings, ICSP, 1998, 1 : 654 - 657
  • [20] Pitch-Synchronous Time Alignment of Speech Signals for Prosody Transplantation
    Latsch, Vagner L.
    Netto, Sergio L.
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 2405 - 2408