Multi-modal Attention for Speech Emotion Recognition

被引:26
|
作者
Pan, Zexu [1 ,2 ]
Luo, Zhaojie [3 ]
Yang, Jichen [4 ]
Li, Haizhou [1 ,4 ]
机构
[1] NUS, Inst Data Sci, Singapore, Singapore
[2] NUS, Grad Sch Integrat Sci & Engn, Singapore, Singapore
[3] Osaka Univ, Osaka, Japan
[4] Natl Univ Singapore NUS, Dept Elect & Comp Engn, Singapore, Singapore
来源
基金
新加坡国家研究基金会;
关键词
speech emotion recognition; multi-modal attention; early fusion; hybrid fusion; SENTIMENT ANALYSIS;
D O I
10.21437/Interspeech.2020-1653
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Emotion represents an essential aspect of human speech that is manifested in speech prosody. Speech, visual, and textual cues are complementary in human communication. In this paper, we study a hybrid fusion method, referred to as multi-modal attention network (MMAN) to makes use of visual and textual cues in speech emotion recognition. We propose a novel multi-modal attention mechanism, cLSTM-MMA, which facilitates the attention across three modalities and selectively fuse the information. cLSTM-MMA is fused with other uni-modal sub-networks in the late fusion. The experiments show that speech emotion recognition benefits significantly from visual and textual cues, and the proposed cLSTM-MMA alone is as competitive as other fusion methods in terms of accuracy, but with a much more compact network structure. The proposed hybrid network MMAN achieves state-of-the-art performance on IEMOCAP database for emotion recognition.
引用
收藏
页码:364 / 368
页数:5
相关论文
共 50 条
  • [1] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [2] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [3] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [4] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [5] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [6] Dense Attention Memory Network for Multi-modal emotion recognition
    Ma, Gailing
    Guo, Xiao
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
  • [7] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [8] A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition
    Hu, Dongni
    Chen, Chengxin
    Zhang, Pengyuan
    Li, Junfeng
    Yan, Yonghong
    Zhao, Qingwei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (08) : 1391 - 1394
  • [9] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim J.-H.
    Lee S.-P.
    Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [10] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    FRONTIERS IN NEUROROBOTICS, 2024, 18