Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引：1

作者：

Zhang, Duzhen ^{[1
]}

Chen, Feilong ^{[1
]}

Chang, Jianlong ^{[1
]}

Chen, Xiuyi ^{[2
]}

Tian, Qi ^{[1
]}

机构：

[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;

D O I：

10.1109/TMM.2023.3238314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.

引用

页码：3987 / 3997

页数：11

共 50 条

[21] Multi-modal Emotion Recognition Based on Speech and Image
Li, Yongqiang
He, Qi
Zhao, Yongping
Yao, Hongxun
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
[22] DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition
Zhang, Yuhao
Hossain, Md Zakir
Rahman, Shafin
HUMAN-COMPUTER INTERACTION, INTERACT 2021, PT III, 2021, 12934 : 227 - 237
[23] Semantic Enhancement Network Integrating Label Knowledge for Multi-modal Emotion Recognition
Zheng, HongFeng
Miao, ShengFa
Yu, Qian
Mu, YongKang
Jin, Xin
Yan, KeShan
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 473 - 484
[24] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[25] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
Chang, Xin
Skarbek, Wladyslaw
SENSORS, 2021, 21 (16)
[26] A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition
Ali, Kamran
Hughes, Charles E.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
[27] A Multi-Modal Deep Learning Approach for Emotion Recognition
Shahzad, H. M.
Bhatti, Sohail Masood
Jaffar, Arfan
Rashid, Muhammad
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02): : 1561 - 1570
[28] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
Priyasad, Darshana
Fernando, Tharindu
Denman, Simon
Sridharan, Sridha
Fookes, Clinton
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
[29] Multi-modal Emotion Recognition for Determining Employee Satisfaction
Zaman, Farhan Uz
Zaman, Maisha Tasnia
Alam, Md Ashraful
Alam, Md Golam Rabiul
2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
[30] Emotion recognition with multi-modal peripheral physiological signals
Gohumpu, Jennifer
Xue, Mengru
Bao, Yanchi
FRONTIERS IN COMPUTER SCIENCE, 2023, 5

← 1 2 3 4 5 →