SRU plus plus : PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

被引:1
|
作者
Pan, Jing [1 ]
Lei, Tao [1 ]
Kim, Kwangyoun [1 ]
Han, Kyu J. [1 ]
Watanabe, Shinji [2 ]
机构
[1] ASAPP Inc, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech recognition; SRU plus; attention; recurrent neural network;
D O I
10.1109/ICASSP43922.2022.9746187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
引用
收藏
页码:7872 / 7876
页数:5
相关论文
共 50 条
  • [41] SpiralNet plus plus : A Fast and Highly Efficient Mesh Convolution Operator
    Gong, Shunwang
    Chen, Lei
    Bronstein, Michael
    Zafeiriou, Stefanos
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4141 - 4148
  • [42] Fast Scalable k-means plus plus Algorithm with MapReduce
    Xu, Yujie
    Qu, Wenyu
    Li, Zhiyang
    Ji, Changqing
    Li, Yuanyuan
    Wu, Yinan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 15 - 28
  • [43] DirectVoxGO plus plus : Fast Neural Radiance Fields for Object Reconstruction
    Perazzo, Daniel
    Lima, Joao Paulo
    Velho, Luiz
    Teichrieb, Veronica
    2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022), 2022, : 156 - 161
  • [44] Fast distributed compilation and testing of large C plus plus projects
    Matev, Rosen
    24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019), 2020, 245
  • [45] WPD plus plus : AN IMPROVED NEURAL BEAMFORMER FOR SIMULTANEOUS SPEECH SEPARATION AND DEREVERBERATION
    Ni, Zhaoheng
    Xu, Yong
    Yu, Meng
    Wu, Bo
    Zhang, Shixiong
    Yu, Dong
    Mandel, Michael, I
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 817 - 824
  • [46] Modified UNet plus plus with attention gate for graphene identification by optical microscopy
    Yang, Bin
    Wu, Mengxi
    Teizer, Winfried
    CARBON, 2022, 195 : 246 - 252
  • [47] TagRec plus plus : Hierarchical Label Aware Attention Network for Question Categorization
    Venktesh, V.
    Mohania, Mukesh
    Goyal, Vikram
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3529 - 3540
  • [48] Robust recognition of fast speech
    Lee, Ki-Seung
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (08) : 2456 - 2459
  • [49] MinkLoc plus plus : Lidar and Monocular Image Fusion for Place Recognition
    Komorowski, Jacek
    Wysoczanska, Monika
    Trzcinski, Tomasz
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [50] OGER plus plus : hybrid multi-type entity recognition
    Furrer, Lenz
    Jancso, Anna
    Colic, Nicola
    Rinaldi, Fabio
    JOURNAL OF CHEMINFORMATICS, 2019, 11 (1)