Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引:1
|
作者
Zhang, Duzhen [1 ]
Chen, Feilong [1 ]
Chang, Jianlong [1 ]
Chen, Xiuyi [2 ]
Tian, Qi [1 ]
机构
[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
关键词
Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;
D O I
10.1109/TMM.2023.3238314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.
引用
收藏
页码:3987 / 3997
页数:11
相关论文
共 50 条
  • [21] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [22] DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition
    Zhang, Yuhao
    Hossain, Md Zakir
    Rahman, Shafin
    HUMAN-COMPUTER INTERACTION, INTERACT 2021, PT III, 2021, 12934 : 227 - 237
  • [23] Semantic Enhancement Network Integrating Label Knowledge for Multi-modal Emotion Recognition
    Zheng, HongFeng
    Miao, ShengFa
    Yu, Qian
    Mu, YongKang
    Jin, Xin
    Yan, KeShan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 473 - 484
  • [24] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
  • [25] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
    Chang, Xin
    Skarbek, Wladyslaw
    SENSORS, 2021, 21 (16)
  • [26] A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition
    Ali, Kamran
    Hughes, Charles E.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [27] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
  • [28] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [29] Multi-modal Emotion Recognition for Determining Employee Satisfaction
    Zaman, Farhan Uz
    Zaman, Maisha Tasnia
    Alam, Md Ashraful
    Alam, Md Golam Rabiul
    2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
  • [30] Emotion recognition with multi-modal peripheral physiological signals
    Gohumpu, Jennifer
    Xue, Mengru
    Bao, Yanchi
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5