ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

被引:3
|
作者
Li, Junhua [1 ]
Duan, Zhikui [1 ]
Li, Shiren [2 ]
Yu, Xinmei [1 ]
Yang, Guangguang [1 ]
机构
[1] Foshan Univ, Foshan 528000, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China
关键词
Feature extraction; Transformers; Convolution; Logic gates; Testing; Tensors; Training; Speech recognition; transformer; enhanced self-attention; multi-order interaction; TRANSFORMER;
D O I
10.1109/LSP.2024.3358754
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.
引用
收藏
页码:471 / 475
页数:5
相关论文
共 50 条
  • [41] Lite Vision Transformer with Enhanced Self-Attention
    Yang, Chenglin
    Wang, Yilin
    Zhang, Jianming
    Zhang, He
    Wei, Zijun
    Lin, Zhe
    Yuille, Alan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11988 - 11998
  • [42] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Zeyer, Albert
    Schmitt, Robin
    Zhou, Wei
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
  • [43] A visual self-attention network for facial expression recognition
    Yu, Naigong
    Bai, Deguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
    Zhang, Liang
    Li, Yang
    Wang, Yanhua
    Wang, Junfu
    Long, Teng
    IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898
  • [45] SELF-ATTENTION GUIDED DEEP FEATURES FOR ACTION RECOGNITION
    Xiao, Renyi
    Hou, Yonghong
    Guo, Zihui
    Li, Chuankun
    Wang, Pichao
    Li, Wanqing
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1060 - 1065
  • [46] Context Matters: Self-Attention for Sign Language Recognition
    Slimane, Fares Ben
    Bouguessa, Mohamed
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7884 - 7891
  • [47] A lightweight transformer with linear self-attention for defect recognition
    Zhai, Yuwen
    Li, Xinyu
    Gao, Liang
    Gao, Yiping
    ELECTRONICS LETTERS, 2024, 60 (17)
  • [48] Finger Vein Recognition Based on ResNet With Self-Attention
    Zhang, Zhibo
    Chen, Guanghua
    Zhang, Weifeng
    Wang, Huiyang
    IEEE ACCESS, 2024, 12 : 1943 - 1951
  • [49] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
    Li, Kunchang
    Wang, Yali
    Zhang, Junhao
    Gao, Peng
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    Qiao, Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
  • [50] Multimodal cooperative self-attention network for action recognition
    Zhong, Zhuokun
    Hou, Zhenjie
    Liang, Jiuzhen
    Lin, En
    Shi, Haiyong
    IET IMAGE PROCESSING, 2023, 17 (06) : 1775 - 1783