ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

被引:3
|
作者
Li, Junhua [1 ]
Duan, Zhikui [1 ]
Li, Shiren [2 ]
Yu, Xinmei [1 ]
Yang, Guangguang [1 ]
机构
[1] Foshan Univ, Foshan 528000, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China
关键词
Feature extraction; Transformers; Convolution; Logic gates; Testing; Tensors; Training; Speech recognition; transformer; enhanced self-attention; multi-order interaction; TRANSFORMER;
D O I
10.1109/LSP.2024.3358754
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.
引用
收藏
页码:471 / 475
页数:5
相关论文
共 50 条
  • [31] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [32] Towards Self-Attention Understanding for Automatic Articulatory Processes Analysis in Cleft Lip and Palate Speech
    Baumann, Ilja
    Wagner, Dominik
    Schuster, Maria
    Riedhammer, Korbinian
    Noeth, Elmar
    Bocklet, Tobias
    INTERSPEECH 2024, 2024, : 2430 - 2434
  • [33] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [34] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [35] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [36] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Li, Lujun
    Kang, Yikai
    Shi, Yuchen
    Kurzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [37] Time-Frequency Deep Representation Learning for Speech Emotion Recognition Integrating Self-attention
    Liu, Jiaxing
    Liu, Zhilei
    Wang, Longbiao
    Guo, Lili
    Dang, Jianwu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 681 - 689
  • [38] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Lujun Li
    Yikai Kang
    Yuchen Shi
    Ludwig Kürzinger
    Tobias Watzel
    Gerhard Rigoll
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [39] Attention Enhanced Citrinet for Speech Recognition
    Wu, Xianchao
    INTERSPEECH 2022, 2022, : 2108 - 2112
  • [40] SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
    Gao, Zhifu
    Zhang, Shiliang
    Lei, Ming
    McLoughlin, Ian
    INTERSPEECH 2020, 2020, : 6 - 10