A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

被引:9
|
作者
Velayuthapandian, Karthikeyan [1 ]
Subramoniam, Suja Priyadharsini [2 ]
机构
[1] Mepco Schlenk Engn Coll, Dept Elect & Commun Engn, Sivakasi, Tamil Nadu, India
[2] Anna Univ Reg Campus, Dept Elect & Commun Engn, Tirunelveli, Tamil Nadu, India
关键词
Speaker recognition; Deep neural network; Spectrogram; 1-D CNN; Focus module; SUPPORT VECTOR MACHINES; SPEAKER; SYSTEM;
D O I
10.1007/s11760-023-02500-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The process of identifying a spokesperson from a collection of subsequent time series data is referred to as speaker identification. Convolutional neural networks (CNNs) and deep neural networks are the two types of neural networks that are used in the majority of modern experimental approaches. This work presents a CNN model for speaker identification using a jump-connected one-dimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase efficiency. The recommended network model was evaluated using the ELSDSR, TIMIT, NIST, 16,000 PCM, and experimental audio datasets. According to experimental data, the equal error rate (EER) of end-to-end CNN for voiceprint identification is 9.02% higher than baseline approaches. In experiments, our proposed speaker recognition (SR) model, which we refer to as the deep FM-1D CNN, had a high recognition accuracy of 99.21%. Moreover, the observations demonstrate that the proposed network model is more robust than other models.
引用
收藏
页码:2817 / 2825
页数:9
相关论文
共 50 条
  • [21] A Light CNN for End-to-End Car License Plates Detection and Recognition
    Wang, Wanwei
    Yang, Jun
    Chen, Min
    Wang, Peng
    IEEE ACCESS, 2019, 7 : 173875 - 173883
  • [22] Sparse R-CNN: An End-to-End Framework for Object Detection
    Sun, Peize
    Zhang, Rufeng
    Jiang, Yi
    Kong, Tao
    Xu, Chenfeng
    Zhan, Wei
    Tomizuka, Masayoshi
    Yuan, Zehuan
    Luo, Ping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15650 - 15664
  • [23] Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs
    Wan, Yi
    Xie, Xianzhong
    Yi, Lingjie
    Jiang, Bo
    Chen, Junfan
    Jiang, Yi
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
  • [24] An End-To-End Emotion Recognition Framework Based on Temporal Aggregation of Multimodal Information
    Radoi, Anamaria
    Birhala, Andreea
    Ristea, Nicolae-Catalin
    Dutu, Liviu-Cristian
    IEEE ACCESS, 2021, 9 : 135559 - 135570
  • [25] Tibetan-Mandarin Bilingual Speech Recognition Based on End-to-End Framework
    Wang, Qingnan
    Guo, Wu
    Chen, Peixin
    Song, Yan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1214 - 1217
  • [26] An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition
    Hu, Mianjun
    Qu, Xiwen
    Huang, Jun
    Wu, Xuangou
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [27] End-to-end Off-angle Iris Recognition Using CNN Based Iris Segmentation
    Jalilian, Ehsaneddin
    Karakaya, Mahmut
    Uhl, Andreas
    2020 INTERNATIONAL CONFERENCE OF THE BIOMETRICS SPECIAL INTEREST GROUP (BIOSIG), 2020, P-306
  • [28] An End-to-End Multi-Task and Fusion CNN for Inertial-Based Gait Recognition
    Delgado-Escano, Ruben
    Castro, Francisco M.
    Cozar, Julian Ramos
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    IEEE ACCESS, 2019, 7 : 1897 - 1908
  • [29] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition
    Jiang, Dongcheng
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 1374 - 1378
  • [30] TSE-CNN: A Two-Stage End-to-End CNN for Human Activity Recognition
    Huang, Jiahui
    Lin, Shuisheng
    Wang, Ning
    Dai, Guanghai
    Xie, Yuxiang
    Zhou, Jun
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (01) : 292 - 299