Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

被引:1
|
作者
Jia-Ching Wang
Chien-Yao Wang
Yu-Hao Chin
Yu-Ting Liu
En-Ting Chen
Pao-Chi Chang
机构
[1] National Central University,Department of Computer Science and Information Engineering
[2] National Central University,Department of Communication Engineering
来源
关键词
STRF; Speaker recognition; Feature extraction; Speaker authentication;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.
引用
收藏
页码:4055 / 4068
页数:13
相关论文
共 50 条
  • [41] Multimedia application for forensic automatic speaker recognition from disguised voices using MFCC feature extraction and classification techniques
    Singh, Mahesh K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77327 - 77345
  • [42] Robust Feature Extraction Using Temporal Context Averaging for Speaker Identification in Diverse Acoustic Environments
    Terraf, Yassin
    Iraqi, Youssef
    IEEE ACCESS, 2024, 12 : 14094 - 14115
  • [43] Acoustic feature extraction method for robust speaker identification
    Zuoqiang Li
    Yong Gao
    Multimedia Tools and Applications, 2016, 75 : 7391 - 7406
  • [44] Acoustic feature extraction method for robust speaker identification
    Li, Zuoqiang
    Gao, Yong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (12) : 7391 - 7406
  • [45] Robust speaker recognition - A feature-based approach
    Mammone, RJ
    Zhang, XY
    Ramachandran, RP
    IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 58 - 71
  • [46] Non-Negative Subspace Projection During Conventional MFCC Feature Extraction for Noise Robust Speech Recognition
    Kumar, D. S. Pavan
    Bilgi, Raghavendra R.
    Umesh, S.
    2013 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2013,
  • [47] A robust feature based on sparse representation for speaker recognition
    Xie, Yining
    Huang, Jinjie
    Wang, Xinlei
    Journal of Computational Information Systems, 2013, 9 (09): : 3553 - 3561
  • [48] A COCHLEAR NEURON BASED ROBUST FEATURE FOR SPEAKER RECOGNITION
    You, Datao
    Jiang, Tao
    Han, Jiqing
    Zheng, Tieran
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5440 - 5443
  • [49] Feature Extraction Based on DCT and MVDR Spectral Estimation for Robust Speech Recognition
    Seyedin, Sanaz
    Ahadi, Mohammad
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 605 - 608
  • [50] Invariant-integration method for robust feature extraction in speaker-independent speech recognition
    Mueller, Florian
    Mertins, Alfred
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2939 - 2942