A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

被引:9
|
作者
Velayuthapandian, Karthikeyan [1 ]
Subramoniam, Suja Priyadharsini [2 ]
机构
[1] Mepco Schlenk Engn Coll, Dept Elect & Commun Engn, Sivakasi, Tamil Nadu, India
[2] Anna Univ Reg Campus, Dept Elect & Commun Engn, Tirunelveli, Tamil Nadu, India
关键词
Speaker recognition; Deep neural network; Spectrogram; 1-D CNN; Focus module; SUPPORT VECTOR MACHINES; SPEAKER; SYSTEM;
D O I
10.1007/s11760-023-02500-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The process of identifying a spokesperson from a collection of subsequent time series data is referred to as speaker identification. Convolutional neural networks (CNNs) and deep neural networks are the two types of neural networks that are used in the majority of modern experimental approaches. This work presents a CNN model for speaker identification using a jump-connected one-dimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase efficiency. The recommended network model was evaluated using the ELSDSR, TIMIT, NIST, 16,000 PCM, and experimental audio datasets. According to experimental data, the equal error rate (EER) of end-to-end CNN for voiceprint identification is 9.02% higher than baseline approaches. In experiments, our proposed speaker recognition (SR) model, which we refer to as the deep FM-1D CNN, had a high recognition accuracy of 99.21%. Moreover, the observations demonstrate that the proposed network model is more robust than other models.
引用
收藏
页码:2817 / 2825
页数:9
相关论文
共 50 条
  • [1] A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
    Karthikeyan Velayuthapandian
    Suja Priyadharsini Subramoniam
    Signal, Image and Video Processing, 2023, 17 : 2817 - 2825
  • [2] Research on End-to-end Voiceprint Recognition Model Based on Convolutional Neural Network
    Hong Zhao
    Yue, Lupeng
    Wang, Weijie
    Zeng Xiangyan
    JOURNAL OF WEB ENGINEERING, 2021, 20 (05): : 1573 - 1585
  • [3] Lightweight End-to-End Stress Recognition using Binarized CNN-LSTM Models
    Yun, Myeongji
    Hong, Seungwoo
    Yoo, Sunwoo
    Kim, Junho
    Park, Sung-Min
    Lee, Youngjoo
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 270 - 273
  • [4] Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition
    Chang, Xuankai
    Watanabe, Shinji
    Delcroix, Marc
    Ochiai, Tsubasa
    Zhang, Wangyou
    Qian, Yanmin
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 39 - 50
  • [5] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang S.
    Li X.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [6] End-to-End Speech Recognition Technology Based on Multi-Stream CNN
    Xiao, Hao
    Qiu, Yuan
    Fei, Rong
    Chen, Xiongbo
    Liu, Zuo
    Wu, Zongling
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1310 - 1315
  • [7] End-To-End Finger Trimodal Features Fusion and Recognition Model Based on CNN
    Wen, Mengna
    Zhang, Haigang
    Yang, Jinfeng
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 39 - 48
  • [8] Evaluation of end-to-end CNN models for palm vein recognition
    Santamaria, Jose, I
    Hernandez-Garcia, Ruber
    Barrientos, Ricardo J.
    Manuel Castro, Francisco
    Ramos-Cozar, Julian
    Guil, Nicolas
    2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2021,
  • [9] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    SYMMETRY-BASEL, 2019, 11 (05):
  • [10] FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
    Basalama, Suhail
    Sohrabizadeh, Atefeh
    Wang, Jie
    Guo, Licheng
    Cong, Jason
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (02)