Modified layer deep convolution neural network for text-independent speaker recognition

被引:9
|
作者
Karthikeyan, V [1 ]
Priyadharsini, Suja S. [2 ]
机构
[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India
[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India
关键词
Speaker identification; deep learning; CNN; spectrogram; MFCC;
D O I
10.1080/0952813X.2022.2092560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.
引用
收藏
页码:273 / 285
页数:13
相关论文
共 50 条
  • [21] A Supervised Text-Independent Speaker Recognition Approach
    Barbu, Tudor
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 22, 2007, 22 : 444 - 448
  • [22] An Improved Approach for Text-Independent Speaker Recognition
    Chakroun, Rania
    Zouari, Leila Beltaifa
    Frikha, Mondher
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (08) : 343 - 348
  • [23] An Exploratory Research on Text-Independent Speaker Recognition
    Nammous, Mohammad Kheir
    Szczepanski, Adam
    Saeed, Khalid
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PART I, 2011, 6678 : 412 - +
  • [24] Speaker-specific mapping for text-independent speaker recognition
    Misra, H
    Ikbal, S
    Yegnanarayana, B
    SPEECH COMMUNICATION, 2003, 39 (3-4) : 301 - 310
  • [25] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [26] PCA/LDA Approach for Text-Independent Speaker Recognition
    Ge, Zhenhao
    Sharma, Sudhendu R.
    Smith, Mark J. T.
    INDEPENDENT COMPONENT ANALYSES, COMPRESSIVE SAMPLING, WAVELETS, NEURAL NET, BIOSYSTEMS, AND NANOENGINEERING X, 2012, 8401
  • [27] Text-independent Hakka Speaker Recognition in Noisy Environments
    Peng, Jie
    Chen, Chin-Ta
    Yang, Cheng-Fu
    SENSORS AND MATERIALS, 2025, 37 (01) : 441 - 451
  • [28] Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition
    Wang, Jianrong
    Wu, Tong
    Wang, Shanyu
    Yu, Mei
    Fang, Qiang
    Zhang, Ju
    Liu, Li
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3380 - 3387
  • [29] A study of variational method for text-independent speaker recognition
    He, Liang
    Tian, Yao
    Liu, Yi
    Dong, Fang
    Zhang, WeiQiang
    Liu, Jia
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [30] Compensation for domain mismatch in text-independent speaker recognition
    Bahmaninezhad, Fahimeh
    Hansen, John H. L.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1071 - 1075