Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引:1
|
作者
Zhang, Duzhen [1 ]
Chen, Feilong [1 ]
Chang, Jianlong [1 ]
Chen, Xiuyi [2 ]
Tian, Qi [1 ]
机构
[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
关键词
Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;
D O I
10.1109/TMM.2023.3238314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.
引用
收藏
页码:3987 / 3997
页数:11
相关论文
共 50 条
  • [41] M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
    Zhang, Yazhou
    Jia, Ao
    Wang, Bo
    Zhang, Peng
    Zhao, Dongming
    Li, Pu
    Hou, Yuexian
    Jin, Xiaojia
    Song, Dawei
    Qin, Jing
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [42] Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition
    Zhang, Yi
    Chen, Mingyuan
    Shen, Jundong
    Wang, Chongjun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9100 - 9108
  • [43] Multi-modal embeddings using multi-task learning for emotion recognition
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    INTERSPEECH 2020, 2020, : 384 - 388
  • [44] A Sustainable Multi-Modal Multi-Layer Emotion-Aware Service at the Edge
    Hu, Long
    Li, Wei
    Yang, Jun
    Fortino, Giancarlo
    Chen, Min
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2022, 7 (02): : 324 - 333
  • [45] M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
    Chudasama, Vishal
    Kar, Purbayan
    Gudmalwar, Ashish
    Shah, Nirmesh
    Wasnik, Pankaj
    Onoe, Naoyuki
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4651 - 4660
  • [46] A Deep GRU-BiLSTM Network for Multi-modal Emotion Recognition from Text
    Yacoubi, Ibtissem
    Ferjaoui, Radhia
    Ben Khalifa, Anouar
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 138 - 143
  • [47] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
  • [48] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [49] A Multi-modal Visual Emotion Recognition Method to Instantiate an Ontology
    Heredia, Juan Pablo A.
    Cardinale, Yudith
    Dongo, Irvin
    Diaz-Amado, Jose
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2021, : 453 - 464
  • [50] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
    Xu, Chao
    Cao, Tianyi
    Feng, Zhiyong
    Dong, Caichao
    CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550