Research on cross-modal emotion recognition based on multi-layer semantic fusion

被引:0
|
作者
Xu Z. [1 ]
Gao Y. [1 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
基金
中国国家自然科学基金;
关键词
cascade encoder; inter-modal information complementation; Mask-gated Fusion Networks (MGF-module); multimodal emotion recognition; multimodal fusion;
D O I
10.3934/mbe.2024110
中图分类号
学科分类号
摘要
Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper, we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model, which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. To achieve comprehensive and rich feature extraction from multimodal sources, considering different dimensions and depth levels, we design a parallel deep learning algorithm module that focuses on extracting features from individual modalities, ensuring cost-effective alignment of extracted features. Furthermore, a cascaded cross-modal encoder module based on Bidirectional Long Short-Term Memory (BILSTM) layer and Convolutional 1D (ConV1d) is introduced to facilitate intermodal information complementation. This module enables the seamless integration of information across modalities, effectively addressing the challenges associated with signal heterogeneity. To facilitate flexible and adaptive information selection and delivery, we design the Mask-gated Fusion Networks (MGF-module), which combines masking technology with gating structures. This approach allows for precise control over the information flow of each modality through gating vectors, mitigating issues related to low recognition accuracy and emotional misjudgment caused by complex features and noisy redundant information. The CM-MSF model underwent evaluation using the widely recognized multimodal emotion recognition datasets CMU-MOSI and CMU-MOSEI. The experimental findings illustrate the exceptional performance of the model, with binary classification accuracies of 89.1% and 88.6%, as well as F1 scores of 87.9% and 88.1% on the CMU-MOSI and CMU-MOSEI datasets, respectively. These results unequivocally validate the effectiveness of our approach in accurately recognizing and classifying emotions. ©2024 the Author(s), licensee AIMS Press.
引用
收藏
页码:2488 / 2514
页数:26
相关论文
共 50 条
  • [1] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
    Ryumina, Elena
    Ryumin, Dmitry
    Axyonov, Alexandr
    Ivanko, Denis
    Karpov, Alexey
    PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
  • [2] Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism
    Zhao, Lianfen
    Pan, Zhengjun
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 381 - 386
  • [3] A Cross-Modal Correlation Fusion Network for Emotion Recognition in Conversations
    Tang, Xiaolyu
    Cai, Guoyong
    Chen, Ming
    Yuan, Peicong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 55 - 68
  • [4] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78
  • [5] Deep Multi-Semantic Fusion-Based Cross-Modal Hashing
    Zhu, Xinghui
    Cai, Liewu
    Zou, Zhuoyang
    Zhu, Lei
    MATHEMATICS, 2022, 10 (03)
  • [6] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [7] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [8] Metaphor recognition based on cross-modal multi-level information fusion
    Yang, Qimeng
    Yan, Yuanbo
    He, Xiaoyu
    Guo, Shisong
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [9] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [10] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
    Zhao, Huan
    Li, Bo
    Zhang, Zixing
    INTERSPEECH 2023, 2023, : 2718 - 2722