Modified layer deep convolution neural network for text-independent speaker recognition

被引:9
|
作者
Karthikeyan, V [1 ]
Priyadharsini, Suja S. [2 ]
机构
[1] Kalasalingam Inst Technol, Dept Elect & Commun Engn, Krishnankoil, Tamil Nadu, India
[2] Anna Univ, Dept Elect & Commun Engn, Reg Campus Tirunelveli, Tirunelveli, Tamil Nadu, India
关键词
Speaker identification; deep learning; CNN; spectrogram; MFCC;
D O I
10.1080/0952813X.2022.2092560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition is the task of identifying the spokesman automatically using speaker-specific features. It has been a popular and most involved topic in the field of speech technology. This field opens a wide opportunity for research and finds its application in the areas such as forensics, authentication, security, etc. In this work, a modified deep-convolutional neural network structure has been proposed for speaker identification that has improved convolution, activation, and pooling layers along with Adam's optimiser. The proposed architecture yielded the increase of prediction accuracy and reduction of Loss function when compared to the generic Convolutional Neural Network scheme. The execution of the proposed architecture is validated by various datasets and the outcomes show that the modified CNN performs better than the other state-of-the-art models regarding both accuracy (avg 99%) and loss function (avg 1%). From the analysis, it is found that the Modified-CNN suits the best for real-time speaker identification applications as the efficacy of the model does not degrade due to the effects of noise and interferences that are caused in the recording environment. Relevance of the work: Speaker Recognition is an area of interest in which ML and DL schemes, when combined, have the potential to make history in the areas of Automation and Authentication. Using a modified CNN can enhance the process by ignoring many issues such as false positives, background noise, and so on. This process can be expanded to create a Raga Identification and Therapy mechanism that can be used to treat diseases.
引用
收藏
页码:273 / 285
页数:13
相关论文
共 50 条
  • [1] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [2] Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    Kim, Seong-Hu
    Park, Yong-Hwa
    INTERSPEECH 2021, 2021, : 66 - 70
  • [3] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [5] TEXT-INDEPENDENT SPEAKER RECOGNITION
    ATAL, BS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (01): : 181 - &
  • [6] Research on text-independent speaker recognition methods using wavelet neural network
    Bai, Ying
    Zhao, Zhen-Dong
    Qi, Yin-Cheng
    Wang, Bin
    Guo, Jian-Yong
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2006, 28 (06): : 1036 - 1039
  • [7] Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network
    Jahangir, Rashid
    TEh, Ying Wah
    Memon, Nisar Ahmed
    Mujtaba, Ghulam
    Zareei, Mahdi
    Ishtiaq, Uzair
    Akhtar, Muhammad Zaheer
    Ali, Ihsan
    IEEE ACCESS, 2020, 8 : 32187 - 32202
  • [8] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    INTERSPEECH 2019, 2019, : 1168 - 1172
  • [9] Segment unit shuffling layer in deep neural networks for text-independent speaker verification
    Heo, Jungwoo
    Shim, Hye-jin
    Kim, Ju-ho
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (02): : 148 - 154
  • [10] An integrated system for text-independent speaker recognition using binary neural network classifiers
    Hou, FL
    Wang, BX
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 710 - 713