Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition

被引:19
|
作者
Jalal, Md Asif [1 ]
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Grp SPandH, Sheffield, S Yorkshire, England
来源
关键词
speech emotion recognition; speech emotion intelligibility; computational paralinguistics;
D O I
10.21437/Interspeech.2020-3007
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. Harmonically structured vowel and consonant sounds add indexical and linguistic cues in spoken information. Previous research argued whether vowel sound cues were more important in carrying the emotional context from a psychological and linguistic point of view. Other research also claimed that emotion information could exist in small overlapping acoustic cues. However, these claims are not corroborated in computational speech emotion recognition systems. In this research, a convolution-based model and a long-short-term memory-based model, both using attention, are applied to investigate these theories of speech emotion on computational models. The role of acoustic context and word importance is demonstrated for the task of speech emotion recognition. The IEMOCAP corpus is evaluated by the proposed models, and 80.1% unweighted accuracy is achieved on pure acoustic data which is higher than current state-of-the-art models on this task. The phones and words are mapped to the attention vectors and it is seen that the vowel sounds are more important for defining emotion acoustic cues than the consonants, and the model can assign word importance based on acoustic context.
引用
收藏
页码:4113 / 4117
页数:5
相关论文
共 50 条
  • [31] Emotion classification from speech signal based on empirical mode decomposition and non-linear features Speech emotion recognition
    Krishnan, Palani Thanaraj
    Alex Noel, Joseph Raj
    Rajangam, Vijayarajan
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1919 - 1934
  • [32] Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information
    Hu, Zhangfang
    LingHu, Kehuan
    Yu, Hongling
    Liao, Chenzhuo
    IEEE ACCESS, 2023, 11 : 50285 - 50294
  • [33] Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database
    Yu, Yeonguk
    Kim, Yoon-Joong
    ELECTRONICS, 2020, 9 (05)
  • [34] Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
    Bautista, John Lorenzo
    Shin, Hyun Soon
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [35] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [36] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE Access, 2021, 9 : 51231 - 51241
  • [37] Anchor Model Fusion for Emotion Recognition in Speech
    Ortego-Resa, Carlos
    Lopez-Moreno, Ignacio
    Ramos, Daniel
    Gonzalez-Rodriguez, Joaquin
    BIOMETRIC ID MANAGEMENT AND MULTIMODAL COMMUNICATION, PROCEEDINGS, 2009, 5707 : 49 - 56
  • [38] A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition
    Zhong, Ying
    Hu, Ying
    Huang, Hao
    Silamu, Wushour
    INTERSPEECH 2020, 2020, : 3331 - 3335
  • [39] Speech Emotion Recognition Based on Wavelet Packet Coefficient Model
    Wang, Kunxia
    An, Ning
    Li, Lian
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 478 - 482
  • [40] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846