Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files

被引:50
|
作者
Andayani, Felicia [1 ]
Theng, Lau Bee [1 ]
Tsun, Mark Teekit [1 ]
Chua, Caslon [2 ]
机构
[1] Swinburne Univ Technol, Fac Engn Comp & Sci, Sarawak Campus, Sarawak 93350, Malaysia
[2] Swinburne Univ Technol, Fac Sci Engn & Technol, Melbourne, Vic 3122, Australia
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Feature extraction; Speech recognition; Transformers; Emotion recognition; Task analysis; Convolutional neural networks; Spectrogram; Attention mechanism; long short-term memory network; speech emotion recognition; transformer encoder;
D O I
10.1109/ACCESS.2022.3163856
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion is a vital component in daily human communication and it helps people understand each other. Emotion recognition plays a crucial role in developing human-computer interaction and computer-based speech emotion recognition. In a nutshell, Speech Emotion Recognition (SER) recognizes emotion signals transmitted through human speech or daily conversation where the emotions in a speech strongly depend on temporal information. Despite the fact that much existing research showed that a hybrid system performs better than traditional single classifiers used in SER, there are some limitations in each of them. As a result, this paper discussed a proposed hybrid Long Short-Term Memory (LSTM) Network and Transformer Encoder to learn the long-term dependencies in speech signals and classify emotions. Speech features are extracted with Mel Frequency Cepstral Coefficient (MFCC) and fed into the proposed hybrid LSTM-Transformer classifier. A range of performance evaluations was conducted on the proposed LSTM-Transformer model. The results indicate that it achieves a significant recognition improvement compared with existing models offered by other published works. The proposed hybrid model reached 75.62%, 85.55%, and 72.49% recognition success with the RAVDESS, Emo-DB, and language-independent datasets.
引用
收藏
页码:36018 / 36027
页数:10
相关论文
共 50 条
  • [1] Recognition of Emotion in Speech-related Audio Files with LSTM-Transformer
    Andayani, Felicia
    Theng, Lau Bee
    Tsun, Mark TeeKit
    Chua, Caslon
    5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 87 - 91
  • [2] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
    Makhmudov, Fazliddin
    Kutlimuratov, Alpamis
    Cho, Young-Im
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [3] Hybrid LSTM-Transformer Model for the Prediction of Epileptic Seizure Using Scalp EEG
    Xia, Lili
    Wang, Ruiqi
    Ye, Haiming
    Jiang, Bochang
    Li, Guang
    Ma, Chao
    Gao, Zhongke
    IEEE SENSORS JOURNAL, 2024, 24 (13) : 21123 - 21131
  • [4] Automatic excavator action recognition and localisation for untrimmed video using hybrid LSTM-Transformer networks
    Martin, Abbey
    Hill, Andrew J.
    Seiler, Konstantin M.
    Balamurali, Mehala
    INTERNATIONAL JOURNAL OF MINING RECLAMATION AND ENVIRONMENT, 2024, 38 (05) : 353 - 372
  • [5] Speech Emotion Recognition in Multimodal Environments with Transformer: Arabic and English Audio Datasets
    Mohamed, Esraa A.
    Koura, Abdelrahim
    Kayed, Mohammed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 581 - 592
  • [6] Speech recognition on MPEG/audio encoded files
    Yapp, L
    Zick, G
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS '97, PROCEEDINGS, 1997, : 624 - 625
  • [7] Joint recognition and parameter estimation of cognitive radar work modes with LSTM-transformer
    Zhang, Ziwei
    Zhu, Mengtao
    Li, Yunjie
    Li, Yan
    Wang, Shafei
    DIGITAL SIGNAL PROCESSING, 2023, 140
  • [8] Composite Foundation Settlement Prediction Based on LSTM-Transformer Model for CFG
    Li, Zichao
    Peng, Yipu
    Li, Jian
    Tang, Zhiyuan
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [9] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [10] Prediction and analysis of sea surface temperature based on LSTM-transformer model
    Fu, Yu
    Song, Jun
    Guo, Junru
    Fu, Yanzhao
    Cai, Yu
    REGIONAL STUDIES IN MARINE SCIENCE, 2024, 78