Performance Evaluation of Offline Speech Recognition on Edge Devices

被引:11
|
作者
Gondi, Santosh [1 ]
Pratap, Vineel [2 ]
机构
[1] Facebook Inc, Menlo Pk, CA 94025 USA
[2] Facebook AI Res, Menlo Pk, CA 94025 USA
关键词
ASR; speech-to-text; edge AI; Wav2Vec; transformers; PyTorch; NEURAL-NETWORKS;
D O I
10.3390/electronics10212697
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning-based speech recognition applications have made great strides in the past decade. Deep learning-based systems have evolved to achieve higher accuracy while using simpler end-to-end architectures, compared to their predecessor hybrid architectures. Most of these state-of-the-art systems run on backend servers with large amounts of memory and CPU/GPU resources. The major disadvantage of server-based speech recognition is the lack of privacy and security for user speech data. Additionally, because of network dependency, this server-based architecture cannot always be reliable, performant and available. Nevertheless, offline speech recognition on client devices overcomes these issues. However, resource constraints on smaller edge devices may pose challenges for achieving state-of-the-art speech recognition results. In this paper, we evaluate the performance and efficiency of transformer-based speech recognition systems on edge devices. We evaluate inference performance on two popular edge devices, Raspberry Pi and Nvidia Jetson Nano, running on CPU and GPU, respectively. We conclude that with PyTorch mobile optimization and quantization, the models can achieve real-time inference on the Raspberry Pi CPU with a small degradation to word error rate. On the Jetson Nano GPU, the inference latency is three to five times better, compared to Raspberry Pi. The word error rate on the edge is still higher, but it is not too far behind, compared to that on the server inference.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Open Source Speech Recognition on Edge Devices
    Peinl, Rene
    Rizk, Basem
    Szabad, Robert
    2020 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER INFORMATION TECHNOLOGIES (ACIT), 2020, : 441 - 445
  • [2] IoT Device Control with Offline Automatic Speech Recognition on Edge Device
    Setiawan, Panji
    Yusuf, Rahadian
    2022 12TH INTERNATIONAL CONFERENCE ON SYSTEM ENGINEERING AND TECHNOLOGY (ICSET 2022), 2022, : 111 - 115
  • [3] Edge Intelligence Empowered English Speech Recognition in Portable Devices
    Li, Zhongmin
    Huang, Weiguo
    Gu, Cui
    INTERNET TECHNOLOGY LETTERS, 2024,
  • [4] TINY TRANSDUCER: A HIGHLY-EFFICIENT SPEECH RECOGNITION MODEL ON EDGE DEVICES
    Zhang, Yuekai
    Sun, Sining
    Ma, Long
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6024 - 6028
  • [5] Edge Container for Speech Recognition
    Beno, Lukas
    Pribis, Rudolf
    Drahos, Peter
    ELECTRONICS, 2021, 10 (19)
  • [6] Speech Recognition on Mobile Devices
    Tan, Zheng-Hua
    Lindberg, Borge
    MOBILE MULTIMEDIA PROCESSING: FUNDAMENTALS, METHODS, AND APPLICATIONS, 2010, 5960 : 221 - 237
  • [7] Speech recognition for mobile devices
    Schmitt, Alexander
    Zaykovskiy, Dmitry
    Minker, Wolfgang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2008, 11 (02) : 63 - 72
  • [8] Offline Thai Speech Recognition Framework on Mobile Device
    Sertsi, Phuttapong
    Chunwijitra, Vataya
    Chunwijitra, Sila
    Wutiwiwatchai, Chai
    2016 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2016, : 324 - 328
  • [9] The performance evaluation of distributed speech recognition for Chinese digits
    Zhao, JH
    Xie, X
    Kuang, JM
    AINA 2003: 17TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, 2003, : 151 - 154
  • [10] Speech recognition system for a service robot - a performance evaluation
    Alibegovic, Besim
    Prljaca, Naser
    Kimmel, Melanie
    Schultalbers, Matthias
    16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 1171 - 1176