SRU plus plus : PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION

被引:1
|
作者
Pan, Jing [1 ]
Lei, Tao [1 ]
Kim, Kwangyoun [1 ]
Han, Kyu J. [1 ]
Watanabe, Shinji [2 ]
机构
[1] ASAPP Inc, Mountain View, CA 94043 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech recognition; SRU plus; attention; recurrent neural network;
D O I
10.1109/ICASSP43922.2022.9746187
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
引用
收藏
页码:7872 / 7876
页数:5
相关论文
共 50 条
  • [21] Summary plus plus : Summarizing Chinese News Articles with Attention
    Zhao, Juan
    Chung, Tong Lee
    Xu, Bin
    Jiang, Minghu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 27 - 37
  • [22] Research on lip recognition algorithm based on MobileNet plus attention-GRU
    Lu, Yuanyao
    Li, Kexin
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (12) : 13526 - 13540
  • [23] SRU-Net: a novel spatiotemporal attention network for sclera segmentation and recognition
    Mashayekhbakhsh, Tara
    Meshgini, Saeed
    Rezaii, Tohid Yousefi
    Makouei, Somayeh
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (03)
  • [24] Improvements on Speech Recognition for Fast Speech
    Lee, Ki-Seung
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (02): : 88 - 95
  • [25] DBSCAN plus plus : Towards fast and scalable density clustering
    Jang, Jennifer
    Jiang, Heinrich
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [26] A Pruning Optimized Fast Learn plus plus NSE Algorithm
    Chen, Yong
    Zhu, Yuquan
    Chen, Haifeng
    Shen, Yan
    Xu, Zhao
    IEEE ACCESS, 2021, 9 : 150733 - 150743
  • [27] RangeNet plus plus : Fast and Accurate LiDAR Semantic Segmentation
    Milioto, Andres
    Vizzo, Ignacio
    Chley, Jens
    Stachniss, Cyrill
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4213 - 4220
  • [28] Distributed C plus plus -Python']Python embedding for fast predictions and fast prototyping
    Varisteas, Georgios
    Avanesov, Tigran
    State, Radu
    DIDL'18: PROCEEDINGS OF THE SECOND WORKSHOP ON DISTRIBUTED INFRASTRUCTURES FOR DEEP LEARNING, 2018, : 9 - 14
  • [29] MAGSAC plus plus , a fast, reliable and accurate robust estimator
    Barath, Daniel
    Noskova, Jana
    Ivashechkin, Maksym
    Matas, Jiri
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1301 - 1309
  • [30] TurboGraph plus plus : A Scalable and Fast Graph Analytics System
    Ko, Seongyun
    Han, Wook-Shin
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 395 - 410