SRU plus plus : PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

被引:1
|
作者
Pan, Jing [1 ]
Lei, Tao [1 ]
Kim, Kwangyoun [1 ]
Han, Kyu J. [1 ]
Watanabe, Shinji [2 ]
机构
[1] ASAPP Inc, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech recognition; SRU plus; attention; recurrent neural network;
D O I
10.1109/ICASSP43922.2022.9746187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
引用
收藏
页码:7872 / 7876
页数:5
相关论文
共 50 条
  • [1] UNet plus plus -Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
    Zhao, Tuo
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [2] ESPnet-SE plus plus : Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
    Lu, Yen-Ju
    Chang, Xuankai
    Li, Chenda
    Zhang, Wangyou
    Cornell, Samuele
    Ni, Zhaoheng
    Masuyama, Yoshiki
    Yan, Brian
    Scheibler, Robin
    Wang, Zhong-Qiu
    Tsao, Yu
    Qian, Yanmin
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 5458 - 5462
  • [3] SYNT plus plus : UTILIZING IMPERFECT SYNTHETIC DATA TO IMPROVE SPEECH RECOGNITION
    Hu, Ting-Yao
    Armandpour, Mohammadreza
    Shrivastava, Ashish
    Chang, Jen-Hao Rick
    Koppula, Hema
    Tuzel, Oncel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7682 - 7686
  • [4] CrossASR plus plus : A Modular Differential Testing Framework for Automatic Speech Recognition
    Asyrofi, Muhammad Hilmi
    Yang, Zhou
    Lo, David
    PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 1575 - 1579
  • [5] Implementation of Speech Recognition System Based on VC plus
    Chen, Hui
    Rui, Xianyi
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 1405 - 1409
  • [6] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
    Liang, Sendong
    Yan, Wei Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
  • [7] Fast Trajectory Prediction Method With Attention Enhanced SRU
    Li, Yadong
    Liu, Bailong
    Zhang, Lei
    Yang, Susong
    Shao, Changxing
    Son, Dan
    IEEE ACCESS, 2020, 8 (08): : 206614 - 206621
  • [8] Computerized content analysis of speech plus speech recognition in the measurement of neuropsychiatric dimensions
    Gottschalk, LA
    Bechtel, RJ
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2005, 77 (01) : 81 - 86
  • [9] DualCoOp plus plus : Fast and Effective Adaptation to Multi-Label Recognition With Limited Annotations
    Hu, Ping
    Sun, Ximeng
    Sclaroff, Stan
    Saenko, Kate
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3450 - 3462
  • [10] STAR plus plus : Rethinking spatio-temporal cross attention transformer for video action recognition
    Ahn, Dasom
    Kim, Sangwon
    Ko, Byoung Chul
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28446 - 28459