Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals

被引:115
|
作者
Chowdhury, Anurag [1 ]
Ross, Arun [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci Engn, E Lansing, MI 48823 USA
关键词
Speaker recognition; Speech recognition; Noise measurement; Mel frequency cepstral coefficient; Speech processing; Feature extraction; Production; degraded audio; deep learning; MFCC; LPC; 1-D CNN; feature-level fusion; NOISE; IDENTIFICATION; SPEECH; MACHINES;
D O I
10.1109/TIFS.2019.2941773
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speaker recognition algorithms are negatively impacted by the quality of the input speech signal. In this work, we approach the problem of speaker recognition from severely degraded audio data by judiciously combining two commonly used features: Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech, viz., speech perception and speech production. A carefully crafted 1D Triplet Convolutional Neural Network (1D-Triplet-CNN) is used to combine these two features in a novel manner, thereby enhancing the performance of speaker recognition in challenging scenarios. Extensive evaluation on multiple datasets, different types of audio degradations, multi-lingual speech, varying length of audio samples, etc. convey the efficacy of the proposed approach over existing speaker recognition methods, including those based on iVector and xVector.
引用
收藏
页码:1616 / 1629
页数:14
相关论文
共 36 条
  • [1] Improved Speaker Recognition for Degraded Human Voice using Modified-MFCC and LPC with CNN
    Moondra, Amit
    Chahal, Dr Poonam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 143 - 151
  • [2] Improved Speaker Recognition for Degraded Human Voice using Modified-MFCC and LPC with CNN
    Moondra, Amit
    Chahal, Poonam
    International Journal of Advanced Computer Science and Applications, 2023, 14 (04): : 143 - 151
  • [3] Speaker Recognition Using LPC, MFCC, ZCR Features with ANN and SVM Classifier for Large Input Database
    Chauhan, Neha
    Isshiki, Tsuyoshi
    Li, Dongju
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 130 - 133
  • [4] Extracting Sub-glottal and Supra-glottal Features from MFCC using Convolutional Neural Networks for Speaker Identification in Degraded Audio Signals
    Chowdhury, Anurag
    Ross, Arun
    2017 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB), 2017, : 608 - 617
  • [5] Kurdish Dialect Recognition using 1D CNN
    Ghafoor, Karzan J.
    Rawf, Karwan M. Hama
    Abdulrahman, Ayub O.
    Taher, Sarkhel H.
    ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2021, 9 (02):
  • [6] Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition
    Flower, Thomas Mary Little
    Jaya, Thirasama
    Singh, Sreedharan Christopher Ezhil
    AUTOMATIKA, 2024, 65 (04) : 1325 - 1338
  • [7] Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
    Sumita Nainan
    Vaishali Kulkarni
    International Journal of Speech Technology, 2021, 24 : 809 - 822
  • [8] Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
    Nainan, Sumita
    Kulkarni, Vaishali
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 809 - 822
  • [9] Embedded Features for 1D CNN-based Action Recognition on Depth Maps
    Trelinski, Jacek
    Kwolek, Bogdan
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 4: VISAPP, 2021, : 536 - 543
  • [10] 1D-CNN-based audio tampering detection using ENF signals
    Zhao, Haifeng
    Ye, Yanming
    Shen, Xingfa
    Liu, Lili
    SCIENTIFIC REPORTS, 2024, 14 (01):