Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

被引:19
|
作者
Wu, Chunyang [1 ]
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Yeh, Ching-Feng [1 ]
Zhang, Frank [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
关键词
streaming speech recognition; transformer; acoustic modeling;
D O I
10.21437/Interspeech.2020-2079
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.
引用
收藏
页码:2132 / 2136
页数:5
相关论文
共 50 条
  • [41] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [42] Image captioning using transformer-based double attention network
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125
  • [43] Traffic Prediction for Optical Fronthaul Network Using Self-Attention Mechanism-Based Transformer
    Zhao, Xujun
    Wu, Yonghan
    Hao, Xue
    Zhang, Lifang
    Wang, Danshi
    Zhang, Min
    2022 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE, ACP, 2022, : 1207 - 1210
  • [44] Self-Attention Network for Session-Based Recommendation With Streaming Data Input
    Sun, Shiming
    Tang, Yuanhe
    Dai, Zemei
    Zhou, Fu
    IEEE ACCESS, 2019, 7 : 110499 - 110509
  • [45] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
    Pan, Xuran
    Ye, Tianzhu
    Xia, Zhuofan
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2082 - 2091
  • [46] Gaze estimation via self-attention augmented convolutions
    Vieira, Gabriel Lefundes
    Oliveira, Luciano
    2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 49 - 56
  • [47] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
  • [48] Local self-attention in transformer for visual question answering
    Shen, Xiang
    Han, Dezhi
    Guo, Zihan
    Chen, Chongqing
    Hua, Jie
    Luo, Gaofeng
    APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
  • [49] Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
    Li, Mohan
    Doddipatla, Rama Sanand
    Zorila, Catalin
    INTERSPEECH 2022, 2022, : 2088 - 2092
  • [50] Local self-attention in transformer for visual question answering
    Xiang Shen
    Dezhi Han
    Zihan Guo
    Chongqing Chen
    Jie Hua
    Gaofeng Luo
    Applied Intelligence, 2023, 53 : 16706 - 16723