Modified layer deep convolution neural network for text-independent speaker recognition

被引:9
|
作者
Karthikeyan, V [1 ]
Priyadharsini, Suja S. [2 ]
机构
[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India
[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India
关键词
Speaker identification; deep learning; CNN; spectrogram; MFCC;
D O I
10.1080/0952813X.2022.2092560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.
引用
收藏
页码:273 / 285
页数:13
相关论文
共 50 条
  • [41] Text-Independent Speaker Verification with Dual Attention Network
    Li, Jingyu
    Lee, Tan
    INTERSPEECH 2020, 2020, : 956 - 960
  • [42] Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic TimeWarping
    Adel, Mohamed
    Afify, Mohamed
    Gaballah, Akram
    Fayek, Magda
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1001 - 1006
  • [43] Text-independent speaker identification
    Gish, Herbert
    Schmidt, Michael
    IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 18 - 32
  • [44] Cepstral Trajectories in Linguistic Units for Text-Independent Speaker Recognition
    Franco-Pedroso, Javier
    Espinoza-Cuadros, Fernando
    Gonzalez-Rodriguez, Joaquin
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 20 - 29
  • [45] Performance of Text-Independent Automatic Speaker Recognition on a Multicore System
    Kouatly, Rand
    Khan, Talha Ali
    TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (02): : 447 - 456
  • [46] Robust features for text-independent speaker recognition with short utterances
    Rania Chakroun
    Mondher Frikha
    Neural Computing and Applications, 2020, 32 : 13863 - 13883
  • [47] Text-independent speaker recognition using support vector machine
    Hou, FL
    Wang, BX
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C402 - C407
  • [48] A Chain of Gaussian Mixture Model for Text-independent Speaker Recognition
    Chen, Yanxiang
    Liu, Ming
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 100 - +
  • [49] TLS-NAP algorithm for text-independent speaker recognition
    He, Liang
    Yang, Yi
    Liu, Jia
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (06): : 916 - 921
  • [50] A TEXT-INDEPENDENT SPEAKER RECOGNITION SYSTEM BASED ON VOWEL SPOTTING
    FAKOTAKIS, N
    TSOPANOGLOU, A
    KOKKINAKIS, G
    SPEECH COMMUNICATION, 1993, 12 (01) : 57 - 68