ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

被引：3

作者：

Li, Junhua ^{[1
]}

Duan, Zhikui ^{[1
]}

Li, Shiren ^{[2
]}

Yu, Xinmei ^{[1
]}

Yang, Guangguang ^{[1
]}

机构：

[1] Foshan Univ, Foshan 528000, Peoples R China

[2] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Feature extraction; Transformers; Convolution; Logic gates; Testing; Tensors; Training; Speech recognition; transformer; enhanced self-attention; multi-order interaction; TRANSFORMER;

D O I：

10.1109/LSP.2024.3358754

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.

引用

页码：471 / 475

页数：5

共 50 条

[41] Lite Vision Transformer with Enhanced Self-Attention
Yang, Chenglin
Wang, Yilin
Zhang, Jianming
Zhang, He
Wei, Zijun
Lin, Zhe
Yuille, Alan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11988 - 11998
[42] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
Zeyer, Albert
Schmitt, Robin
Zhou, Wei
Schlueter, Ralf
Ney, Hermann
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
[43] A visual self-attention network for facial expression recognition
Yu, Naigong
Bai, Deguo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[44] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
Zhang, Liang
Li, Yang
Wang, Yanhua
Wang, Junfu
Long, Teng
IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898
[45] SELF-ATTENTION GUIDED DEEP FEATURES FOR ACTION RECOGNITION
Xiao, Renyi
Hou, Yonghong
Guo, Zihui
Li, Chuankun
Wang, Pichao
Li, Wanqing
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1060 - 1065
[46] Context Matters: Self-Attention for Sign Language Recognition
Slimane, Fares Ben
Bouguessa, Mohamed
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7884 - 7891
[47] A lightweight transformer with linear self-attention for defect recognition
Zhai, Yuwen
Li, Xinyu
Gao, Liang
Gao, Yiping
ELECTRONICS LETTERS, 2024, 60 (17)
[48] Finger Vein Recognition Based on ResNet With Self-Attention
Zhang, Zhibo
Chen, Guanghua
Zhang, Weifeng
Wang, Huiyang
IEEE ACCESS, 2024, 12 : 1943 - 1951
[49] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
Li, Kunchang
Wang, Yali
Zhang, Junhao
Gao, Peng
Song, Guanglu
Liu, Yu
Li, Hongsheng
Qiao, Yu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
[50] Multimodal cooperative self-attention network for action recognition
Zhong, Zhuokun
Hou, Zhenjie
Liang, Jiuzhen
Lin, En
Shi, Haiyong
IET IMAGE PROCESSING, 2023, 17 (06) : 1775 - 1783

← 1 2 3 4 5 →