MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

被引：0

作者：

Zhu, Lei ^{[1
]}

Ding, Yu ^{[1
]}

Huang, Aiai ^{[1
]}

Tan, Xufei ^{[2
]}

Zhang, Jianhai ^{[3
,4
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310000, Peoples R China

[2] Hangzhou City Univ, Sch Med, Hangzhou 310015, Peoples R China

[3] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310000, Peoples R China

[4] Hangzhou City Univ, Key Lab Brain Machine Collaborat Intelligence Zhej, Hangzhou 310015, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 01期

关键词：

Deep learning; Physiological signal; Multimodal fusion; Emotion recognition; EEG;

D O I：

10.1007/s11760-024-03632-0

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Currently, research on emotion recognition has shown that multi-modal data fusion has advantages in improving the accuracy and robustness of human emotion recognition, outperforming single-modal methods. Despite the promising results of existing methods, significant challenges remain in effectively fusing data from multiple modalities to achieve superior performance. Firstly, existing works tend to focus on generating a joint representation by fusing multi-modal data, with fewer methods considering the specific characteristics of each modality. Secondly, most methods fail to fully capture the intricate correlations among multiple modalities, often resorting to simplistic combinations of latent features. To address these challenges, we propose a novel fusion network for multi-modal emotion recognition. This network enhances the efficacy of multi-modal fusion while preserving the distinct characteristics of each modality. Specifically, a dual-stream multi-scale feature encoding (MFE) is designed to extract emotional information from both electroencephalogram (EEG) and peripheral physiological signals (PPS) temporal slices. Subsequently, a cross-modal global-local feature fusion module (CGFFM) is proposed to integrate global and local information from multi-modal data and then assign different importance to each modality, which makes the fusion data tend to the more important modalities. Meanwhile, the transformer module is employed to further learn the modality-specific information. Moreover, we introduce the adaptive collaboration block (ACB), which optimally leverages both modality-specific and cross-modality relations for enhanced integration and feature representation. Following extensive experiments on the DEAP and DREAMER multimodal datasets, our model achieves state-of-the-art performance.

引用

页数：12

共 50 条

[31] Emotion recognition from multiple physiological signals using intra- and inter-modality attention fusion network
Gong, Linlin
Chen, Wanzhong
Li, Mingyang
Zhang, Tao
DIGITAL SIGNAL PROCESSING, 2024, 144
[32] A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals
Fu, Baole
Gu, Chunrui
Fu, Ming
Xia, Yuxiao
Liu, Yinhua
FRONTIERS IN NEUROSCIENCE, 2023, 17
[33] MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition
Zhao, Ziping
Gao, Tian
Wang, Haishuai
Schuller, Bjoern
INTERSPEECH 2024, 2024, : 3719 - 3723
[34] Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion
Yu Wang
Personal and Ubiquitous Computing, 2019, 23 : 383 - 392
[35] Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion
Wang, Yu
PERSONAL AND UBIQUITOUS COMPUTING, 2019, 23 (3-4) : 383 - 392
[36] Emotion Recognition Measurement based on Physiological Signals
Fan, Xiaoli
Yan, Ye
Wang, Xiaomin
Yan, Huijiong
Li, You
Xie, Liang
Yin, Erwei
2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 81 - 86
[37] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
Guo, Peini
Chen, Zhengyan
Li, Yidi
Liu, Hong
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
[38] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
Liu, Xiaodong
Li, Songyang
Wang, Miao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[39] GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation
Li, Jiang
Wang, Xiaoping
Lv, Guoqing
Zeng, Zhigang
NEUROCOMPUTING, 2023, 550
[40] Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition
Wan, Xin
Wang, Yongxiong
Wang, Zhe
Tang, Yiheng
Liu, Benke
PHYSIOLOGICAL MEASUREMENT, 2024, 45 (07)

← 1 2 3 4 5 →