Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

被引：1

作者：

Jia-Ching Wang

Chien-Yao Wang

Yu-Hao Chin

Yu-Ting Liu

En-Ting Chen

Pao-Chi Chang

机构：

[1] National Central University,Department of Computer Science and Information Engineering

[2] National Central University,Department of Communication Engineering

来源：

Multimedia Tools and Applications | 2017年 / 76卷

关键词：

STRF; Speaker recognition; Feature extraction; Speaker authentication;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.

引用

页码：4055 / 4068

页数：13

共 50 条

[41] Multimedia application for forensic automatic speaker recognition from disguised voices using MFCC feature extraction and classification techniques
Singh, Mahesh K.
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77327 - 77345
[42] Robust Feature Extraction Using Temporal Context Averaging for Speaker Identification in Diverse Acoustic Environments
Terraf, Yassin
Iraqi, Youssef
IEEE ACCESS, 2024, 12 : 14094 - 14115
[43] Acoustic feature extraction method for robust speaker identification
Zuoqiang Li
Yong Gao
Multimedia Tools and Applications, 2016, 75 : 7391 - 7406
[44] Acoustic feature extraction method for robust speaker identification
Li, Zuoqiang
Gao, Yong
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (12) : 7391 - 7406
[45] Robust speaker recognition - A feature-based approach
Mammone, RJ
Zhang, XY
Ramachandran, RP
IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 58 - 71
[46] Non-Negative Subspace Projection During Conventional MFCC Feature Extraction for Noise Robust Speech Recognition
Kumar, D. S. Pavan
Bilgi, Raghavendra R.
Umesh, S.
2013 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2013,
[47] A robust feature based on sparse representation for speaker recognition
Xie, Yining
Huang, Jinjie
Wang, Xinlei
Journal of Computational Information Systems, 2013, 9 (09): : 3553 - 3561
[48] A COCHLEAR NEURON BASED ROBUST FEATURE FOR SPEAKER RECOGNITION
You, Datao
Jiang, Tao
Han, Jiqing
Zheng, Tieran
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5440 - 5443
[49] Feature Extraction Based on DCT and MVDR Spectral Estimation for Robust Speech Recognition
Seyedin, Sanaz
Ahadi, Mohammad
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 605 - 608
[50] Invariant-integration method for robust feature extraction in speaker-independent speech recognition
Mueller, Florian
Mertins, Alfred
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2939 - 2942

← 1 2 3 4 5 →