ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

被引:3
|
作者
Li, Junhua [1 ]
Duan, Zhikui [1 ]
Li, Shiren [2 ]
Yu, Xinmei [1 ]
Yang, Guangguang [1 ]
机构
[1] Foshan Univ, Foshan 528000, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou 510275, Peoples R China
关键词
Feature extraction; Transformers; Convolution; Logic gates; Testing; Tensors; Training; Speech recognition; transformer; enhanced self-attention; multi-order interaction; TRANSFORMER;
D O I
10.1109/LSP.2024.3358754
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.
引用
收藏
页码:471 / 475
页数:5
相关论文
共 50 条
  • [21] Exploring Self-Attention Mechanisms for Speech Separation
    Subakan, Cem
    Ravanelli, Mirco
    Cornell, Samuele
    Grondin, Francois
    Bronzi, Mirko
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2169 - 2180
  • [22] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [23] Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
    Poirier, Samuel
    Cote-Allard, Ulysse
    Routhier, Francois
    Campeau-Lecours, Alexandre
    SENSORS, 2023, 23 (13)
  • [24] Acoustic model training using self-attention for low-resource speech recognition
    Park, Hosung
    Kim, Ji-Hwan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 483 - 489
  • [25] Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    IEEE ACCESS, 2022, 10 : 115732 - 115743
  • [26] DILATED RESIDUAL NETWORK WITH MULTI-HEAD SELF-ATTENTION FOR SPEECH EMOTION RECOGNITION
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Zhao, Sheng
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6675 - 6679
  • [27] Enhancing speech emotion recognition: a deep learning approach with self-attention and acoustic features
    Aghajani, Khadijeh
    Zohrevandi, Mahbanou
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (05):
  • [28] SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition
    Goswami, Raktim Gautam
    Patel, Naman
    Krishnamurthy, Prashanth
    Khorrami, Farshad
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8242 - 8249
  • [29] Automatic Food Recognition Using Deep Convolutional Neural Networks with Self-attention Mechanism
    Rahib Abiyev
    Joseph Adepoju
    Human-Centric Intelligent Systems, 2024, 4 (1): : 171 - 186
  • [30] Speech translation enhanced automatic speech recognition
    Paulik, M
    Stüker, S
    Fügen, C
    Schultz, T
    Schaaf, T
    Waibel, A
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 121 - 126