Research on cross-modal emotion recognition based on multi-layer semantic fusion

被引：0

作者：

Xu Z. ^{[1
]}

Gao Y. ^{[1
]}

机构：

[1] College of Information Engineering, Shanghai Maritime University, Shanghai

来源：

Mathematical Biosciences and Engineering | 2024年 / 21卷 / 02期

基金：

中国国家自然科学基金;

关键词：

cascade encoder; inter-modal information complementation; Mask-gated Fusion Networks (MGF-module); multimodal emotion recognition; multimodal fusion;

D O I：

10.3934/mbe.2024110

中图分类号：

学科分类号：

摘要：

Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate intermodal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions. ©2024 the Author(s), licensee AIMS Press.

引用

页码：2488 / 2514

页数：26

共 50 条

[21] CFN-ESA: A Cross-Modal Fusion Network With Emotion-Shift Awareness for Dialogue Emotion Recognition
Li J.
Wang X.
Liu Y.
Zeng Z.
IEEE Transactions on Affective Computing, 2024, 15 (04): : 1 - 16
[22] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
Gong, Peizhu
Liu, Jin
Wu, Zhongdai
Han, Bing
Wang, Y. Ken
He, Huihua
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220
[23] Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation
Chen, Lijiang
Ren, Jie
Mao, Xia
Zhao, Qi
APPLIED SCIENCES-BASEL, 2022, 12 (09):
[24] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
Duan, Zaipeng
Huang, Xiao
Ma, Jie
NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6361 - 6375
[25] Cross-modal credibility modelling for EEG-based multimodal emotion recognition
Zhang, Yuzhe
Liu, Huan
Wang, Di
Zhang, Dalin
Lou, Tianyu
Zheng, Qinghua
Quek, Chai
JOURNAL OF NEURAL ENGINEERING, 2024, 21 (02)
[26] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
Zaipeng Duan
Xiao Huang
Jie Ma
Neural Processing Letters, 2023, 55 : 6361 - 6375
[27] Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network
Li, Feng
Luo, Jiusong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 211 - 221
[28] Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval
Wu, Yiling
Wang, Shuhui
Song, Guoli
Huang, Qingming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4299 - 4312
[29] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
Zhao, Hongkun
Liu, Siyuan
Chen, Yang
Kong, Fanmin
Zeng, Qingtian
Li, Kang
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[30] Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
Hong, Soyeon
Kang, Hyeoungguk
Cho, Hyunsouk
IEEE ACCESS, 2024, 12 : 14324 - 14333

← 1 2 3 4 5 →