Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

被引:55
|
作者
Yu, Yeonguk [1 ]
Kim, Yoon-Joong [1 ]
机构
[1] Hanbat Natl Univ, Dept Comp Engn, Daejeon 34158, South Korea
关键词
speech-emotion recognition; attention mechanism; LSTM;
D O I
10.3390/electronics9050713
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a speech-emotion recognition (SER) model with an "attention-long Long Short-Term Memory (LSTM)-attention" component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model's performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [42] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
  • [43] Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition
    Zhang, Hua
    Gou, Ruoyun
    Shang, Jili
    Shen, Fangyao
    Wu, Yifan
    Dai, Guojun
    FRONTIERS IN PHYSIOLOGY, 2021, 12
  • [44] BAT: Block and token self-attention for speech emotion recognition
    Lei, Jianjun
    Zhu, Xiangwei
    Wang, Ying
    Neural Networks, 2022, 156 : 67 - 80
  • [45] Attention gated tensor neural network architectures for speech emotion recognition
    Pandey, Sandeep Kumar
    Shekhawat, Hanumant Singh
    Prasanna, S. R. M.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [46] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    Chmulik, Michal
    ELECTRONICS, 2021, 10 (10)
  • [47] Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information
    Hu, Zhangfang
    LingHu, Kehuan
    Yu, Hongling
    Liao, Chenzhuo
    IEEE ACCESS, 2023, 11 : 50285 - 50294
  • [48] Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms
    Kim, Junghun
    An, Yoojin
    Kim, Jihie
    INTERSPEECH 2022, 2022, : 136 - 140
  • [49] Attention in Convolutional LSTM for Gesture Recognition
    Zhang, Liang
    Zhu, Guangming
    Mei, Lin
    Shen, Peiyi
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [50] MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION
    Nediyanchath, Anish
    Paramasivam, Periyasamy
    Yenigalla, Promod
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7179 - 7183