STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES

被引:0
|
作者
He, Yanzhang [1 ]
Sainath, Tara N. [1 ]
Prabhavalkar, Rohit [1 ]
McGraw, Ian [1 ]
Alvarez, Raziel [1 ]
Zhao, Ding [1 ]
Rybach, David [1 ]
Kannan, Anjuli [1 ]
Wu, Yonghui [1 ]
Pang, Ruoming [1 ]
Liang, Qiao [1 ]
Bhatia, Deepti [1 ]
Yuan Shangguan [1 ]
Li, Bo [1 ]
Pundak, Golan [1 ]
Sim, Khe Chai [1 ]
Bagby, Tom [1 ]
Chang, Shuo-yiin [1 ]
Rao, Kanishka [1 ]
Gruenstein, Alexander [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/icassp.2019.8682336
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.
引用
收藏
页码:6381 / 6385
页数:5
相关论文
共 50 条
  • [1] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
  • [2] PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAMED ENTITIES
    Sim, Khe Chai
    Beaufays, Francoise
    Guliani, Arnaud Benard Dhruv
    Kabel, Andreas
    Khare, Nikhil
    Lucassen, Tamar
    Zadrazil, Petr
    Zhang, Harry
    Johnson, Leif
    Motta, Giovanni
    Zhou, Lillian
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 23 - 30
  • [3] Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 439 - 443
  • [4] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang S.
    Li X.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [5] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [6] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
    Zhang, C.
    Li, B.
    Sainath, T. N.
    Strohman, T.
    Mavandadi, S.
    Chang, S.
    Haghani, P.
    INTERSPEECH 2022, 2022, : 3223 - 3227
  • [7] Low Latency End-to-End Streaming Speech Recognition with a Scout Network
    Wang, Chengyi
    Wu, Yu
    Lu, Liang
    Liu, Shujie
    Li, Jinyu
    Ye, Guoli
    Zhou, Ming
    INTERSPEECH 2020, 2020, : 2112 - 2116
  • [8] Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
    Joshi, Vikas
    Zhao, Rui
    Mehta, Rupesh R.
    Kumar, Kshitiz
    Li, Jinyu
    INTERSPEECH 2020, 2020, : 2152 - 2156
  • [9] A Lightweight End-to-End Speech Recognition System on Embedded Devices
    Wang, Yu
    Nishizaki, Hiromitsu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (07) : 1230 - 1239
  • [10] WeNet: Production Oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
    Yao, Zhuoyuan
    Wu, Di
    Wang, Xiong
    Zhang, Binbin
    Yu, Fan
    Yang, Chao
    Peng, Zhendong
    Chen, Xiaoyu
    Xie, Lei
    Lei, Xin
    INTERSPEECH 2021, 2021, : 4054 - 4058