SRU plus plus : PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

被引：1

作者：

Pan, Jing ^{[1
]}

Lei, Tao ^{[1
]}

Kim, Kwangyoun ^{[1
]}

Han, Kyu J. ^{[1
]}

Watanabe, Shinji ^{[2
]}

机构：

[1] ASAPP Inc, Mountain View, CA 94043 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

speech recognition; SRU plus; attention; recurrent neural network;

D O I：

10.1109/ICASSP43922.2022.9746187

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.

引用

页码：7872 / 7876

页数：5

共 50 条

[1] UNet plus plus -Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
Zhao, Tuo
Zhao, Yunxin
Wang, Shaojun
Han, Mei
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[2] ESPnet-SE plus plus : Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
Lu, Yen-Ju
Chang, Xuankai
Li, Chenda
Zhang, Wangyou
Cornell, Samuele
Ni, Zhaoheng
Masuyama, Yoshiki
Yan, Brian
Scheibler, Robin
Wang, Zhong-Qiu
Tsao, Yu
Qian, Yanmin
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 5458 - 5462
[3] SYNT plus plus : UTILIZING IMPERFECT SYNTHETIC DATA TO IMPROVE SPEECH RECOGNITION
Hu, Ting-Yao
Armandpour, Mohammadreza
Shrivastava, Ashish
Chang, Jen-Hao Rick
Koppula, Hema
Tuzel, Oncel
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7682 - 7686
[4] CrossASR plus plus : A Modular Differential Testing Framework for Automatic Speech Recognition
Asyrofi, Muhammad Hilmi
Yang, Zhou
Lo, David
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 1575 - 1579
[5] Implementation of Speech Recognition System Based on VC plus
Chen, Hui
Rui, Xianyi
PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 1405 - 1409
[6] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
Liang, Sendong
Yan, Wei Qi
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
[7] Fast Trajectory Prediction Method With Attention Enhanced SRU
Li, Yadong
Liu, Bailong
Zhang, Lei
Yang, Susong
Shao, Changxing
Son, Dan
IEEE ACCESS, 2020, 8 (08): : 206614 - 206621
[8] Computerized content analysis of speech plus speech recognition in the measurement of neuropsychiatric dimensions
Gottschalk, LA
Bechtel, RJ
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2005, 77 (01) : 81 - 86
[9] DualCoOp plus plus : Fast and Effective Adaptation to Multi-Label Recognition With Limited Annotations
Hu, Ping
Sun, Ximeng
Sclaroff, Stan
Saenko, Kate
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3450 - 3462
[10] STAR plus plus : Rethinking spatio-temporal cross attention transformer for video action recognition
Ahn, Dasom
Kim, Sangwon
Ko, Byoung Chul
APPLIED INTELLIGENCE, 2023, 53 (23) : 28446 - 28459

← 1 2 3 4 5 →