Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification

被引:0
|
作者
Missaoui, Ibrahim [1 ,2 ]
Lachiri, Zied [1 ]
机构
[1] Univ Tunis El Manar, Natl Engn Sch Tunis ENIT, Signal Images & Informat Technol Lab, LR-11-ES17,BP 37, Tunis 1002, Tunisia
[2] Univ Gabes, Higher Inst Comp Sci & Multimedia Gabes, Gabes, Tunisia
关键词
Stationary wavelet filtering cepstral; coefficients; SWFCC; SWT; Stationary wavelet packet transform; Implicit wiener filtering; Feature extraction; GMM-UBM; Robust speaker recognition; SPEECH WAVE; PACKET;
D O I
10.1016/j.apacoust.2024.110435
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Extracting robust effective speech features is one of the challenging topics in the speaker recognition field, especially in noisy conditions. It can substantially improve the robustness recognition accuracy of persons from their voice signals against such conditions. This paper proposes a new feature extraction approach called Stationary Wavelet Filtering Cepstral Coefficients (SWFCC) for noisy speaker recognition. The proposed approach incorporates a Stationary Wavelet Filterbank (SWF) and an Implicit Wiener Filtering (IWF) technique. The SWF is based on the stationary wavelet packet transform, which is a shift-invariant transform. The performance of the proposed SWFCC approach is evaluated on the TIMIT dataset in the presence of different types of environmental noise, which are taken from the Aurora dataset. Our experimental results using the Gaussian Mixture ModelUniversal Background Model (GMM-UBM) as a classifier show that SWFCC outperforms various feature extraction techniques like MFCC, PNCC, and GFCC in terms of recognition accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Robust speaker identification system based on wavelet transform and Gaussian mixture model
    Chen, WC
    Hsieh, CT
    Lai, E
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 263 - 271
  • [42] Robust speaker identification system based on wavelet transform and Gaussian mixture model
    Hsieh, CT
    Lai, E
    Wang, YC
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2003, 19 (02) : 267 - 282
  • [43] WAVELET BASED CEPSTRAL COEFFICIENTS FOR NEURAL NETWORK SPEECH RECOGNITION
    Adam, T. B.
    Salam, M. S.
    Gunawan, T. S.
    2013 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS (IEEE ICSIPA 2013), 2013, : 447 - 451
  • [44] Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual
    Sreehari, V. R.
    Mary, Leena
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 147 - 161
  • [45] Automatic short utterance speaker recognition using stationary wavelet coefficients of pitch synchronised LP residual
    V. R Sreehari
    Leena Mary
    International Journal of Speech Technology, 2022, 25 : 147 - 161
  • [46] Comparative Analysis on Different Cepstral Features for Speaker Identification Recognition
    Hanifa, R. M.
    Isa, K.
    Mohamad, S.
    2020 18TH IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2020, : 487 - 492
  • [47] Cepstral analysis for formants frequencies determination dedicated to speaker identification
    Gargouri, D
    Frikha, M
    Laffet, MW
    Kamoun, MA
    Ben Hamida, A
    2004 IEEE International Conference on Industrial Technology (ICIT), Vols. 1- 3, 2004, : 1298 - 1302
  • [48] The wavelet packet based cepstral features for open set speaker classification in Marathi
    Patil, HA
    Dutta, PK
    Basu, TK
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 134 - +
  • [49] Channel-robust speaker identification using Modified-Mean Cepstral Mean Normalization with Frequency Warping
    Garcia, AA
    Mammone, RJ
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 325 - 328
  • [50] Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions
    Zilovic, MS
    Ramachandran, RP
    Mammone, RJ
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03): : 260 - 267