Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

被引：19

作者：

Wu, Chunyang ^{[1
]}

Wang, Yongqiang ^{[1
]}

Shi, Yangyang ^{[1
]}

Yeh, Ching-Feng ^{[1
]}

Zhang, Frank ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

streaming speech recognition; transformer; acoustic modeling;

D O I：

10.21437/Interspeech.2020-2079

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.

引用

页码：2132 / 2136

页数：5

共 50 条

[41] Global-Local Self-Attention Based Transformer for Speaker Verification
Xie, Fei
Zhang, Dalong
Liu, Chengming
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[42] Image captioning using transformer-based double attention network
Parvin, Hashem
Naghsh-Nilchi, Ahmad Reza
Mohammadi, Hossein Mahvash
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125
[43] Traffic Prediction for Optical Fronthaul Network Using Self-Attention Mechanism-Based Transformer
Zhao, Xujun
Wu, Yonghan
Hao, Xue
Zhang, Lifang
Wang, Danshi
Zhang, Min
2022 ASIA COMMUNICATIONS AND PHOTONICS CONFERENCE, ACP, 2022, : 1207 - 1210
[44] Self-Attention Network for Session-Based Recommendation With Streaming Data Input
Sun, Shiming
Tang, Yuanhe
Dai, Zemei
Zhou, Fu
IEEE ACCESS, 2019, 7 : 110499 - 110509
[45] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Pan, Xuran
Ye, Tianzhu
Xia, Zhuofan
Song, Shiji
Huang, Gao
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2082 - 2091
[46] Gaze estimation via self-attention augmented convolutions
Vieira, Gabriel Lefundes
Oliveira, Luciano
2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 49 - 56
[47] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
Paiva, Pedro V. V.
Ramos, Josue J. G.
Gavrilova, Marina
Carvalho, Marco A. G.
COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
[48] Local self-attention in transformer for visual question answering
Shen, Xiang
Han, Dezhi
Guo, Zihan
Chen, Chongqing
Hua, Jie
Luo, Gaofeng
APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
[49] Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition
Li, Mohan
Doddipatla, Rama Sanand
Zorila, Catalin
INTERSPEECH 2022, 2022, : 2088 - 2092
[50] Local self-attention in transformer for visual question answering
Xiang Shen
Dezhi Han
Zihan Guo
Chongqing Chen
Jie Hua
Gaofeng Luo
Applied Intelligence, 2023, 53 : 16706 - 16723

← 1 2 3 4 5 →