Speaker Identification based on MFSC voice feature extraction using Transformer

被引:3
|
作者
Bao, Liao [1 ]
Zuo, Yi [1 ]
机构
[1] Dalian Maritime Univ, Dalian, Peoples R China
基金
中国国家自然科学基金;
关键词
Speaker Identification; voiceprint feature; extraction; MFSC; MFCC; neural network; SUPPORT VECTOR MACHINES; JOINT FACTOR-ANALYSIS;
D O I
10.1109/ICDMW60847.2023.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a type of biometric authentication technology. It can automatically identify the speaker's identity based on voice parameters. The core technology of speaker identification is to extract voice features that can best reflect the speaker's personality characteristics from the collected speech samples, and train models based on these features identify the speaker, recognize voiceprint, and so on. In the research field of speaker identification, voiceprint feature extraction briefly determines the accuracy of the speaker identification model. Among numerous voiceprint features, Mel Frequency Cepstral Coefficients (MFCC) are widely used in voiceprint identification systems due to the excellent performance of Mel filters. However, several studies revealed that MFCC features are not completely correlated globally, and only a few feature vectors are sufficient to represent most of the information in the signals. To address this limitation, we propose a new spectral representation of compressed speech, which is named as Mel Frequency Spectral Coefficients (MFSC). In MFSC, we eliminate discrete cosine transform (DCT). In the experiments, MFCC is used as the comparative feature, and end-to-end neural networks of bidirectional GRU, bidirectional LSTM, and Transformer are used as the identification models. According to 921 voice data from the LibriSpeech database, experiments have shown that the MFSC model using Transformer has better testing accuracy than MFCC models, and the error rate is reduced from 0.090 to 0.079.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] Speaker Identification based on Hybrid Feature Extraction Techniques
    Abualadas, Feras E.
    Zeki, Akram M.
    Al-Ani, Muzhir Shaban
    Messikh, Az-Eddine
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 322 - 327
  • [2] Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (02) : 973 - 987
  • [3] Effectiveness of Feature Collaboration in Speaker Identification for Voice Biometrics
    Das, Arunima
    Roy, Lakshi Prosad
    Das, Santos Kumar
    2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [4] Improving Speaker Identification via Singular Value Decomposition Based Feature Transformer
    Mishra, Bibhu Prasad
    Chakroborty, Sandipan
    Saha, Goutam
    2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 827 - 832
  • [5] Discriminative feature extraction applied to speaker identification
    Nealand, JH
    Bradley, AB
    Lech, M
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 484 - 487
  • [6] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
  • [7] A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction
    Miyajima, C
    Watanabe, H
    Tokuda, K
    Kitamura, T
    Katagiri, S
    SPEECH COMMUNICATION, 2001, 35 (3-4) : 203 - 218
  • [8] A review on Speech and Speaker Authentication System using Voice Signal feature selection and extraction
    Chandra, E.
    Sunitha, C.
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 1341 - +
  • [9] An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model
    Khan, Arfat Ahmad
    Jahangir, Rashid
    Alroobaea, Roobaea
    Alyahyan, Saleh Yahya
    Almulhi, Ahmed H.
    Alsafyani, Majed
    Wechtaisong, Chitapong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4085 - 4100
  • [10] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
    Wang, Jianglin
    Johnson, Michael T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,