Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile

被引:19
|
作者
Li, Jeng-Lin [1 ,2 ]
Lee, Chi-Chun [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
[2] MOST Joint Res Ctr AI Technol & All Vista Healthc, Taipei, Taiwan
来源
关键词
personal attribute; multimodal emotion recognition; attention; psycholinguistic norm;
D O I
10.21437/Interspeech.2019-2044
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
A growing number of human-centered applications benefit from continuous advancements in the emotion recognition technology. Many emotion recognition algorithms have been designed to model multimodal behavior cues to achieve high performances. However, most of them do not consider the modulating factors of an individual's personal attributes in his/her expressive behaviors. In this work, we propose a Personalized Attributes-Aware Attention Network (PAaAN) with a novel personalized attention mechanism to perform emotion recognition using speech and language cues. The attention profile is learned from embeddings of an individual's profile, acoustic, and lexical behavior data. The profile embedding is derived using linguistics inquiry word count computed between the target speaker and a large set of movie scripts. Our method achieves the state-of-the-art 70.3% unweighted accuracy in a four class emotion recognition task on the IEMOCAP. Further analysis reveals that affect-related semantic categories are emphasized differently for each speaker in the corpus showing the effectiveness of our attention mechanism for personalization.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 50 条
  • [21] Deep Feature Extraction and Attention Fusion for Multimodal Emotion Recognition
    Yang, Zhiyi
    Li, Dahua
    Hou, Fazheng
    Song, Yu
    Gao, Qiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1526 - 1530
  • [22] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59
  • [23] Neural correlates of individual differences in multimodal emotion recognition ability
    Laukka, Petri
    Mansson, Kristoffer N. T.
    Cortes, Diana S.
    Manzouri, Amirhossein
    Frick, Andreas
    Fredborg, William
    Fischer, Hakan
    CORTEX, 2024, 175 : 1 - 11
  • [24] Emotion Recognition using Multimodal Residual LSTM Network
    Ma, Jiaxin
    Tang, Hao
    Zheng, Wei-Long
    Lu, Bao-Liang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 176 - 183
  • [25] Speaker Attentive Speech Emotion Recognition
    Le Moine, Clement
    Obin, Nicolas
    Roebel, Axel
    INTERSPEECH 2021, 2021, : 2866 - 2870
  • [26] Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation
    Guo, Lili
    Song, Yikang
    Ding, Shifei
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [27] Attentive Cross-modal Connections for Deep Multimodal Wearable-based Emotion Recognition
    Bhatti, Anubhav
    Behinaein, Behnam
    Rodenburg, Dirk
    Hungler, Paul
    Etemad, Ali
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [28] Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
    Mocanu, Bogdan
    Tapu, Ruxandra
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [29] m_AutNet-A Framework for Personalized Multimodal Emotion Recognition in Autistic Children
    Kurian, Asha
    Tripathi, Shikha
    IEEE ACCESS, 2025, 13 : 1651 - 1662
  • [30] Speech Emotion Recognition via an Attentive Time-Frequency Neural Network
    Lu, Cheng
    Zheng, Wenming
    Lian, Hailun
    Zong, Yuan
    Tang, Chuangao
    Li, Sunan
    Zhao, Yan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3159 - 3168