Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引：1

作者：

Zhang, Duzhen ^{[1
]}

Chen, Feilong ^{[1
]}

Chang, Jianlong ^{[1
]}

Chen, Xiuyi ^{[2
]}

Tian, Qi ^{[1
]}

机构：

[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;

D O I：

10.1109/TMM.2023.3238314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.

引用

页码：3987 / 3997

页数：11

共 50 条

[41] M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
Zhang, Yazhou
Jia, Ao
Wang, Bo
Zhang, Peng
Zhao, Dongming
Li, Pu
Hou, Yuexian
Jin, Xiaojia
Song, Dawei
Qin, Jing
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[42] Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition
Zhang, Yi
Chen, Mingyuan
Shen, Jundong
Wang, Chongjun
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9100 - 9108
[43] Multi-modal embeddings using multi-task learning for emotion recognition
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
INTERSPEECH 2020, 2020, : 384 - 388
[44] A Sustainable Multi-Modal Multi-Layer Emotion-Aware Service at the Edge
Hu, Long
Li, Wei
Yang, Jun
Fortino, Giancarlo
Chen, Min
IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2022, 7 (02): : 324 - 333
[45] M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
Chudasama, Vishal
Kar, Purbayan
Gudmalwar, Ashish
Shah, Nirmesh
Wasnik, Pankaj
Onoe, Naoyuki
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4651 - 4660
[46] A Deep GRU-BiLSTM Network for Multi-modal Emotion Recognition from Text
Yacoubi, Ibtissem
Ferjaoui, Radhia
Ben Khalifa, Anouar
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 138 - 143
[47] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
Lian, Zheng
Liu, Bin
Tao, Jianhua
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
[48] Multi-modal emotion recognition using EEG and speech signals
Wang, Qian
Wang, Mou
Yang, Yan
Zhang, Xiaolei
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
[49] A Multi-modal Visual Emotion Recognition Method to Instantiate an Ontology
Heredia, Juan Pablo A.
Cardinale, Yudith
Dongo, Irvin
Diaz-Amado, Jose
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2021, : 453 - 464
[50] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
Xu, Chao
Cao, Tianyi
Feng, Zhiyong
Dong, Caichao
CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550

← 1 2 3 4 5 →