SRU plus plus : PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

被引:1
|
作者
Pan, Jing [1 ]
Lei, Tao [1 ]
Kim, Kwangyoun [1 ]
Han, Kyu J. [1 ]
Watanabe, Shinji [2 ]
机构
[1] ASAPP Inc, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech recognition; SRU plus; attention; recurrent neural network;
D O I
10.1109/ICASSP43922.2022.9746187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
引用
收藏
页码:7872 / 7876
页数:5
相关论文
共 50 条
  • [31] SpeechFormer plus plus : A Hierarchical Efficient Framework for Paralinguistic Speech Processing
    Chen, Weidong
    Xing, Xiaofen
    Xu, Xiangmin
    Pang, Jianxin
    Du, Lan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 775 - 788
  • [32] The MUSCIMA plus plus Dataset for Handwritten Optical Music Recognition
    Hajic, Jan, Jr.
    Pecina, Pavel
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 39 - 46
  • [33] BDPCA plus LDA: A novel fast feature extraction technique for face recognition
    Zuo, Wangmeng
    Zhang, David
    Yang, Han
    Wang, Kuanquan
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2006, 36 (04): : 946 - 953
  • [34] Locality preserving projections plus affinity propagation: a fast method for face recognition
    Du, Chunhua
    Yang, Jie
    Wu, Qiang
    Zhang, Tianhao
    Yu, Shengyang
    OPTICAL ENGINEERING, 2008, 47 (04)
  • [35] Shipping plus compliance - fast
    Automatic I.D. News, 1998, 14 (12):
  • [36] Dynamic gesture recognition based on attention-guided spatial graph convolution SRU
    Chen X.-Q.
    She Q.-S.
    Zhang B.-T.
    Ma Y.-L.
    Zhang J.-H.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3083 - 3092
  • [37] Recurrence structures in 4-dimensional manifolds with metric of signature ( plus , plus ,-,-)
    Hall, Graham
    Kirik, Bahar
    JOURNAL OF GEOMETRY AND PHYSICS, 2015, 98 : 262 - 274
  • [38] Generative Adversarial Networks and Simulated plus Unsupervised Learning in Affect Recognition from Speech
    Krokotsch, Tilman
    Boeck, Ronald
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [39] Bovigrip® RSP plus - pioneering in the Bovine Influenza-Prophylaxis
    Rehm, Solveig
    TIERAERZTLICHE PRAXIS AUSGABE GROSSTIERE NUTZTIERE, 2016, 44 (04): : 272 - 272
  • [40] LANGUAGE EQUALS MIMESIS PLUS SPEECH
    LAAKSO, A
    BEHAVIORAL AND BRAIN SCIENCES, 1993, 16 (04) : 765 - 766