STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES

被引:0
|
作者
He, Yanzhang [1 ]
Sainath, Tara N. [1 ]
Prabhavalkar, Rohit [1 ]
McGraw, Ian [1 ]
Alvarez, Raziel [1 ]
Zhao, Ding [1 ]
Rybach, David [1 ]
Kannan, Anjuli [1 ]
Wu, Yonghui [1 ]
Pang, Ruoming [1 ]
Liang, Qiao [1 ]
Bhatia, Deepti [1 ]
Yuan Shangguan [1 ]
Li, Bo [1 ]
Pundak, Golan [1 ]
Sim, Khe Chai [1 ]
Bagby, Tom [1 ]
Chang, Shuo-yiin [1 ]
Rao, Kanishka [1 ]
Gruenstein, Alexander [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/icassp.2019.8682336
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.
引用
收藏
页码:6381 / 6385
页数:5
相关论文
共 50 条
  • [31] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 936 - 943
  • [32] Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition
    Kurata, Gakuto
    Saon, George
    INTERSPEECH 2020, 2020, : 2117 - 2121
  • [33] Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
    Zhang, Shiliang
    Gao, Zhifu
    Luo, Haoneng
    Lei, Ming
    Gao, Jie
    Yan, Zhijie
    Xie, Lei
    INTERSPEECH 2020, 2020, : 2142 - 2146
  • [34] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [35] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [36] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [37] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [38] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [39] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [40] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596