Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引:1
|
作者
Zhang, Duzhen [1 ]
Chen, Feilong [1 ]
Chang, Jianlong [1 ]
Chen, Xiuyi [2 ]
Tian, Qi [1 ]
机构
[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
关键词
Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;
D O I
10.1109/TMM.2023.3238314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.
引用
收藏
页码:3987 / 3997
页数:11
相关论文
共 50 条
  • [31] Facial emotion recognition using multi-modal information
    De Silva, LC
    Miyasato, T
    Nakatsu, R
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
  • [32] Skeleton Aware Multi-modal Sign Language Recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3408 - 3418
  • [33] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 3408 - 3418
  • [34] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    arXiv, 2021,
  • [35] Sentiment and Emotion-Aware Multi-Modal Complaint Identification
    Singh, Apoorva
    Dey, Soumyodeep
    Singha, Anamitra
    Saha, Sriparna
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12163 - 12171
  • [36] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [37] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [38] Driver multi-task emotion recognition network based on multi-modal facial video analysis
    Xiang, Guoliang
    Yao, Song
    Wu, Xianhui
    Deng, Hanwen
    Wang, Guojie
    Liu, Yu
    Li, Fan
    Peng, Yong
    PATTERN RECOGNITION, 2025, 161
  • [39] MODALITY-AWARE OOD SUPPRESSION USING FEATURE DISCREPANCY FOR MULTI-MODAL EMOTION RECOGNITION
    Kang, Dohee
    Kang, Somang
    Kim, Daeha
    Song, Byung Cheol
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1035 - 1039
  • [40] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
    Akhtar, Md Shad
    Chauhan, Dushyant Singh
    Ghosal, Deepanway
    Poria, Soujanya
    Ekbal, Asif
    Bhattacharyya, Pushpak
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379