Modified layer deep convolution neural network for text-independent speaker recognition

被引：9

作者：

Karthikeyan, V ^{[1
]}

Priyadharsini, Suja S. ^{[2
]}

机构：

[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India

[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India

来源：

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE | 2024年 / 36卷 / 02期

关键词：

Speaker identification; deep learning; CNN; spectrogram; MFCC;

D O I：

10.1080/0952813X.2022.2092560

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.

引用

页码：273 / 285

页数：13

共 50 条

[41] Text-Independent Speaker Verification with Dual Attention Network
Li, Jingyu
Lee, Tan
INTERSPEECH 2020, 2020, : 956 - 960
[42] Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic TimeWarping
Adel, Mohamed
Afify, Mohamed
Gaballah, Akram
Fayek, Magda
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1001 - 1006
[43] Text-independent speaker identification
Gish, Herbert
Schmidt, Michael
IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 18 - 32
[44] Cepstral Trajectories in Linguistic Units for Text-Independent Speaker Recognition
Franco-Pedroso, Javier
Espinoza-Cuadros, Fernando
Gonzalez-Rodriguez, Joaquin
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 20 - 29
[45] Performance of Text-Independent Automatic Speaker Recognition on a Multicore System
Kouatly, Rand
Khan, Talha Ali
TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (02): : 447 - 456
[46] Robust features for text-independent speaker recognition with short utterances
Rania Chakroun
Mondher Frikha
Neural Computing and Applications, 2020, 32 : 13863 - 13883
[47] Text-independent speaker recognition using support vector machine
Hou, FL
Wang, BX
2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C402 - C407
[48] A Chain of Gaussian Mixture Model for Text-independent Speaker Recognition
Chen, Yanxiang
Liu, Ming
ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 100 - +
[49] TLS-NAP algorithm for text-independent speaker recognition
He, Liang
Yang, Yi
Liu, Jia
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (06): : 916 - 921
[50] A TEXT-INDEPENDENT SPEAKER RECOGNITION SYSTEM BASED ON VOWEL SPOTTING
FAKOTAKIS, N
TSOPANOGLOU, A
KOKKINAKIS, G
SPEECH COMMUNICATION, 1993, 12 (01) : 57 - 68

← 1 2 3 4 5 →