A High-Performance Neural Network SoC for End-to-End Speaker Verification

被引:0
|
作者
Tsai, Tsung-Han [1 ]
Chiang, Meng-Jui [1 ]
机构
[1] Natl Cent Univ, Dept Elect Engn, Taoyuan 32001, Taiwan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speaker verification (SV); speaker identification; x-vector; RISC-V; system-on-chip (SoC); GMM;
D O I
10.1109/ACCESS.2024.3491780
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of the neural network to recognize a speaker's identity from their speech sounds has become popular in the last few years. Among these methods, the x-vector extractor, which is based on time-delay neural networks (TDNN), performs better in noise-canceling and generally achieves higher accuracy compared to previous methods such as the Gaussian mixture model (GMM) and the support vector machines (SVM). This paper presents a system-on-chip (SoC) composed of a RISC-V CPU and a neural network accelerator module for x-vector-based speaker verification (SV). To ensure real-time latency and enable the implementation of the system on edge devices, this work employs three steps for processing x-vector including size reduction, pruning, and compression. We are dedicated to optimizing the data flow with sparsity. Compared with the conventional sparse matrix compression method compressed sparse row (CSR), we propose the binary pointer compressed sparse row (BPCSR) method which significantly improves the latency and avoids the load balancing issue in each PE. We further design the neural network accelerator module that stores the compressed parameters and computes the x-vector extractor while the RISC-V CPU processes the rest of the calculations such as feature extraction and the classifier. The system was tested on the VoxCeleb dataset, containing 1251 test speakers, and achieved over 95% accuracy. Lastly, we synthesized the chip with TSMC 90 nm technology. It presents 15.5 mm2 in the area and 97.88 mW for real-time identification.
引用
收藏
页码:165482 / 165496
页数:15
相关论文
共 50 条
  • [31] END-TO-END ATTENTION BASED TEXT-DEPENDENT SPEAKER VERIFICATION
    Zhang, Shi-Xiong
    Chen, Zhuo
    Zhao, Yong
    Li, Jinyu
    Gong, Yifan
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 171 - 178
  • [32] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    INTERSPEECH 2022, 2022, : 1471 - 1475
  • [33] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [34] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    INTERSPEECH 2022, 2022, : 1461 - 1465
  • [35] A Framework for End-to-End Simulation of High-performance Computing Systems
    Denzel, Wolfgang E.
    Li, Jian
    Walker, Peter
    Jin, Yuho
    SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2010, 86 (5-6): : 331 - 350
  • [36] SIAMESE CAPSULE NETWORK FOR END-TO-END SPEAKER RECOGNITION IN THE WILD
    Hajavi, Amirhossein
    Etemad, Ali
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7203 - 7207
  • [37] Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor
    Chen, Zhengyang
    Han, Bing
    Wang, Shuai
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 3552 - 3556
  • [38] TDMF: TASK-DRIVEN MULTILEVEL FRAMEWORK FOR END-TO-END SPEAKER VERIFICATION
    Chen, Chen
    Han, Jiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6809 - 6813
  • [39] END-TO-END TEXT-INDEPENDENT SPEAKER VERIFICATION WITH FLEXIBILITY IN UTTERANCE DURATION
    Zhang, Chunlei
    Koishida, Kazuhito
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 584 - 590
  • [40] End-To-End Neural Speaker Diarization Through Step-Function
    Latypov, Rustam
    Stolov, Evgeni
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,