Modified layer deep convolution neural network for text-independent speaker recognition

被引:9
|
作者
Karthikeyan, V [1 ]
Priyadharsini, Suja S. [2 ]
机构
[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India
[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India
关键词
Speaker identification; deep learning; CNN; spectrogram; MFCC;
D O I
10.1080/0952813X.2022.2092560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.
引用
收藏
页码:273 / 285
页数:13
相关论文
共 50 条
  • [31] Exploring discriminative learning for text-independent speaker recognition
    Liu, Ming
    Zhang, Zhengyou
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 56 - 59
  • [32] A discriminative training approach for text-independent speaker recognition
    Hong, QY
    Kwong, S
    SIGNAL PROCESSING, 2005, 85 (07) : 1449 - 1463
  • [33] I-MATRIX FOR TEXT-INDEPENDENT SPEAKER RECOGNITION
    He, Liang
    Liu, Jia
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7194 - 7198
  • [34] Text-independent speaker recognition using graph matching
    Hautamaki, Ville
    Kinnunen, Tomi
    Franti, Pasi
    PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1427 - 1432
  • [35] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [36] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [37] Automatic text-independent speaker verification using convolutional deep belief network
    Rakhmanenko, I. A.
    Shelupanov, A. A.
    Kostyuchenko, E. Y.
    COMPUTER OPTICS, 2020, 44 (04) : 596 - +
  • [38] Text-independent speaker identification using a hybrid neural network and conformity approach
    Ouzounov, A
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2098 - 2102
  • [39] Text-independent speaker authentication with spiking neural networks
    Wysoski, Simei Gomes
    Benuskova, Lubica
    Kasabov, Nikola
    ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 2, PROCEEDINGS, 2007, 4669 : 758 - +
  • [40] Neural networks for improved text-independent speaker identification
    Yue, XC
    Ye, DT
    Zheng, CX
    Wu, XY
    IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2002, 21 (02): : 53 - 58