STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES

被引:0
|
作者
He, Yanzhang [1 ]
Sainath, Tara N. [1 ]
Prabhavalkar, Rohit [1 ]
McGraw, Ian [1 ]
Alvarez, Raziel [1 ]
Zhao, Ding [1 ]
Rybach, David [1 ]
Kannan, Anjuli [1 ]
Wu, Yonghui [1 ]
Pang, Ruoming [1 ]
Liang, Qiao [1 ]
Bhatia, Deepti [1 ]
Yuan Shangguan [1 ]
Li, Bo [1 ]
Pundak, Golan [1 ]
Sim, Khe Chai [1 ]
Bagby, Tom [1 ]
Chang, Shuo-yiin [1 ]
Rao, Kanishka [1 ]
Gruenstein, Alexander [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/icassp.2019.8682336
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.
引用
收藏
页码:6381 / 6385
页数:5
相关论文
共 50 条
  • [21] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [22] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [23] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 2140 - 2144
  • [24] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [25] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [26] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [27] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
    Wang, Chao
    Wen, Yao
    Lhamo, Phurba
    Tashi, Nyima
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
  • [28] Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
    Moriya, Takafumi
    Tanaka, Tomohiro
    Ashihara, Takanori
    Ochiai, Tsubasa
    Sato, Hiroshi
    Ando, Atsushi
    Masumura, Ryo
    Delcroix, Marc
    Asami, Taichi
    INTERSPEECH 2021, 2021, : 1787 - 1791
  • [29] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
    Yeh, Ching-Feng
    Wang, Yongqiang
    Shi, Yangyang
    Wu, Chunyang
    Zhang, Frank
    Chan, Julian
    Seltzer, Michael L.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
  • [30] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    IEEE ACCESS, 2023, 11 : 13906 - 13917