Multi-modal Attention for Speech Emotion Recognition

被引:26
|
作者
Pan, Zexu [1 ,2 ]
Luo, Zhaojie [3 ]
Yang, Jichen [4 ]
Li, Haizhou [1 ,4 ]
机构
[1] NUS, Inst Data Sci, Singapore, Singapore
[2] NUS, Grad Sch Integrat Sci & Engn, Singapore, Singapore
[3] Osaka Univ, Osaka, Japan
[4] Natl Univ Singapore NUS, Dept Elect & Comp Engn, Singapore, Singapore
来源
基金
新加坡国家研究基金会;
关键词
speech emotion recognition; multi-modal attention; early fusion; hybrid fusion; SENTIMENT ANALYSIS;
D O I
10.21437/Interspeech.2020-1653
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to makes use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information. cLSTM-MMA is fused with other uni-modal sub-networks in the late fusion. The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure. The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.
引用
收藏
页码:364 / 368
页数:5
相关论文
共 50 条
  • [21] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
    Cai, Linqin
    Dong, Jiangong
    Wei, Min
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
  • [22] Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning
    Liu, Dong
    Wang, Zhiyong
    Wang, Lifeng
    Chen, Longxi
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [23] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [24] A multi-modal emotion fusion classification method combined expression and speech based on attention mechanism
    Liu, Dong
    Chen, Longxi
    Wang, Lifeng
    Wang, Zhiyong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 41677 - 41695
  • [25] A multi-modal emotion fusion classification method combined expression and speech based on attention mechanism
    Dong Liu
    Longxi Chen
    Lifeng Wang
    Zhiyong Wang
    Multimedia Tools and Applications, 2022, 81 : 41677 - 41695
  • [26] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [27] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [28] Multi-modal Emotion Recognition for Determining Employee Satisfaction
    Zaman, Farhan Uz
    Zaman, Maisha Tasnia
    Alam, Md Ashraful
    Alam, Md Golam Rabiul
    2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
  • [29] Emotion recognition with multi-modal peripheral physiological signals
    Gohumpu, Jennifer
    Xue, Mengru
    Bao, Yanchi
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [30] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329