Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

被引:19
|
作者
Wu, Chunyang [1 ]
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Yeh, Ching-Feng [1 ]
Zhang, Frank [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
关键词
streaming speech recognition; transformer; acoustic modeling;
D O I
10.21437/Interspeech.2020-2079
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.
引用
收藏
页码:2132 / 2136
页数:5
相关论文
共 50 条
  • [11] Transformer-Based Dual-Channel Self-Attention for UUV Autonomous Collision Avoidance
    Lin, Changjian
    Cheng, Yuhu
    Wang, Xuesong
    Yuan, Jianya
    Wang, Guoqing
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (03): : 2319 - 2331
  • [12] In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar
    Cai, Jack
    Kaleem, Muhammad Ahsan
    Genov, Roman
    Azghadi, Mostafa Rahimi
    Amirsoleimani, Amirali
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [13] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499
  • [14] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
    Lee, Mun-Hak
    Lee, Sang-Eon
    Seong, Ju-Seok
    Chang, Joon-Hyuk
    Kwon, Haeyoung
    Park, Chanhee
    INTERSPEECH 2022, 2022, : 56 - 60
  • [15] Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR
    Zhou, Xinyuan
    Lee, Grandee
    Yilmaz, Emre
    Long, Yanhua
    Liang, Jiaen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 5016 - 5020
  • [16] A Multi-Head Self-Attention Transformer-Based Model for Traffic Situation Prediction in Terminal Areas
    Yu, Zhou
    Shi, Xingyu
    Zhang, Zhaoning
    IEEE ACCESS, 2023, 11 : 16156 - 16165
  • [17] ET: Re -Thinking Self-Attention for Transformer Models on GPUs
    Chen, Shiyang
    Huang, Shaoyi
    Pandey, Santosh
    Li, Bingbing
    Gao, Guang R.
    Zheng, Long
    Ding, Caiwen
    Liu, Hang
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [18] GSAC-UFormer: Groupwise Self-Attention Convolutional Transformer-Based UNet for Medical Image Segmentation
    Garbaz, Anass
    Oukdach, Yassine
    Charfi, Said
    El Ansari, Mohamed
    Koutti, Lahcen
    Salihoun, Mouna
    COGNITIVE COMPUTATION, 2025, 17 (02)
  • [19] Vision Transformer Based on Reconfigurable Gaussian Self-attention
    Zhao L.
    Zhou J.-K.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1976 - 1988
  • [20] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
    Jeoung, Ye-Rin
    Choi, Jeong-Hwan
    Seong, Ju-Seok
    Kyung, JeHyun
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 3197 - 3201