Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引：1

作者：

Zhang, Duzhen ^{[1
]}

Chen, Feilong ^{[1
]}

Chang, Jianlong ^{[1
]}

Chen, Xiuyi ^{[2
]}

Tian, Qi ^{[1
]}

机构：

[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;

D O I：

10.1109/TMM.2023.3238314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.

引用

页码：3987 / 3997

页数：11

共 50 条

[31] Facial emotion recognition using multi-modal information
De Silva, LC
Miyasato, T
Nakatsu, R
ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
[32] Skeleton Aware Multi-modal Sign Language Recognition
Jiang, Songyao
Sun, Bin
Wang, Lichen
Bai, Yue
Li, Kunpeng
Fu, Yun
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3408 - 3418
[33] Skeleton aware multi-modal sign language recognition
Jiang, Songyao
Sun, Bin
Wang, Lichen
Bai, Yue
Li, Kunpeng
Fu, Yun
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 3408 - 3418
[34] Skeleton aware multi-modal sign language recognition
Jiang, Songyao
Sun, Bin
Wang, Lichen
Bai, Yue
Li, Kunpeng
Fu, Yun
arXiv, 2021,
[35] Sentiment and Emotion-Aware Multi-Modal Complaint Identification
Singh, Apoorva
Dey, Soumyodeep
Singha, Anamitra
Saha, Sriparna
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12163 - 12171
[36] Lightweight multi-modal emotion recognition model based on modal generation
Liu, Peisong
Che, Manqiang
Luo, Jiangchuan
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
[37] Cross-modal dynamic convolution for multi-modal emotion recognition
Wen, Huanglu
You, Shaodi
Fu, Ying
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
[38] Driver multi-task emotion recognition network based on multi-modal facial video analysis
Xiang, Guoliang
Yao, Song
Wu, Xianhui
Deng, Hanwen
Wang, Guojie
Liu, Yu
Li, Fan
Peng, Yong
PATTERN RECOGNITION, 2025, 161
[39] MODALITY-AWARE OOD SUPPRESSION USING FEATURE DISCREPANCY FOR MULTI-MODAL EMOTION RECOGNITION
Kang, Dohee
Kang, Somang
Kim, Daeha
Song, Byung Cheol
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1035 - 1039
[40] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Akhtar, Md Shad
Chauhan, Dushyant Singh
Ghosal, Deepanway
Poria, Soujanya
Ekbal, Asif
Bhattacharyya, Pushpak
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379

← 1 2 3 4 5 →