STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES

被引：0

作者：

He, Yanzhang ^{[1
]}

Sainath, Tara N. ^{[1
]}

Prabhavalkar, Rohit ^{[1
]}

McGraw, Ian ^{[1
]}

Alvarez, Raziel ^{[1
]}

Zhao, Ding ^{[1
]}

Rybach, David ^{[1
]}

Kannan, Anjuli ^{[1
]}

Wu, Yonghui ^{[1
]}

Pang, Ruoming ^{[1
]}

Liang, Qiao ^{[1
]}

Bhatia, Deepti ^{[1
]}

Yuan Shangguan ^{[1
]}

Li, Bo ^{[1
]}

Pundak, Golan ^{[1
]}

Sim, Khe Chai ^{[1
]}

Bagby, Tom ^{[1
]}

Chang, Shuo-yiin ^{[1
]}

Rao, Kanishka ^{[1
]}

Gruenstein, Alexander ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

D O I：

10.1109/icassp.2019.8682336

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.

引用

页码：6381 / 6385

页数：5

共 50 条

[21] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[22] Overview of end-to-end speech recognition
Wang, Song
Li, Guanyu
2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[23] End-to-end Accented Speech Recognition
Viglino, Thibault
Motlicek, Petr
Cernak, Milos
INTERSPEECH 2019, 2019, : 2140 - 2144
[24] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[25] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[26] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[27] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
Wang, Chao
Wen, Yao
Lhamo, Phurba
Tashi, Nyima
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
[28] Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
Moriya, Takafumi
Tanaka, Tomohiro
Ashihara, Takanori
Ochiai, Tsubasa
Sato, Hiroshi
Ando, Atsushi
Masumura, Ryo
Delcroix, Marc
Asami, Taichi
INTERSPEECH 2021, 2021, : 1787 - 1791
[29] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
Yeh, Ching-Feng
Wang, Yongqiang
Shi, Yangyang
Wu, Chunyang
Zhang, Frank
Chan, Julian
Seltzer, Michael L.
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
[30] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
Moriya, Takafumi
Sato, Hiroshi
Ochiai, Tsubasa
Delcroix, Marc
Shinozaki, Takahiro
IEEE ACCESS, 2023, 11 : 13906 - 13917

← 1 2 3 4 5 →