Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis

被引:1
|
作者
Wang, Ruiqing [1 ]
Yang, Qimeng [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
He, Xiaoyu [3 ]
Wang, Bo [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Xinjiang Univ, Network & Informat Ctr, Network, Xinjiang, Peoples R China
[3] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830000, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Transformer; Multimodal fusion; Collaborative learning; FUSION;
D O I
10.1016/j.neucom.2024.129163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Sentiment Analysis (MSA) aims to recognize and understand a speaker's sentiment state by integrating information from natural language, facial expressions, and voice, has gained much attention in recent years. However, modeling multimodal data poses two main challenges: 1) There are potential sentiment correlations between modalities and within contextual contexts, making it difficult to perform deep-level sentiment correlation mining and information fusion; 2) Sentiment information tends to be unevenly distributed across different modalities, posing challenges in fully leveraging information from each modality for collaborative learning. To address the above challenges, we propose CMLG based on correlation mining and label generation. This approach utilizes a Squeeze and Excitation Network (SEN) to recalibrate modality features and employs Transformer-based intra-modal and inter-modal feature extractors to mine the intrinsic connections between different modalities. In addition, we designed a Self-Supervised Label Generation Module (SLGM) that relies on the positive correlation between feature distances and label offsets to generate single-peak labels, and jointly train multi-peak and single-peak tasks to detect sentiment differences. Extensive experiments on three benchmark dataset (MOSI, MOSEI and SIMS) have shown that the above proposed method CMLG achieves excellent results.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis
    Li, Zuhe
    Guo, Qingbing
    Pan, Yushan
    Ding, Weiping
    Yu, Jun
    Zhang, Yazhou
    Liu, Weihua
    Chen, Haoran
    Wang, Hao
    Xie, Ying
    INFORMATION FUSION, 2023, 99
  • [2] Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
    Wu, Yujin
    Daoudi, Mohamed
    Amad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 157 - 172
  • [3] Self-supervised Correlation Mining Network for Person Image Generation
    Wang, Zijian
    Qi, Xingqun
    Yuan, Kun
    Sun, Muyi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7693 - 7702
  • [4] Self-Supervised Unimodal Label Generation Strategy Using Recalibrated Modality Representations for Multimodal Sentiment Analysis
    Hwang, Yewon
    Kim, Jong-Hwan
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 35 - 46
  • [5] Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
    Yuan, Ziqi
    Li, Wei
    Xu, Hua
    Yu, Wenmeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4400 - 4407
  • [6] An autoencoder-based self-supervised learning for multimodal sentiment analysis
    Feng, Wenjun
    Wang, Xin
    Cao, Donglin
    Lin, Dazhen
    INFORMATION SCIENCES, 2024, 675
  • [7] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [8] Sentiment Knowledge Enhanced Self-supervised Learning for Multimodal Sentiment Analysis
    Qian, Fan
    Han, Jiqing
    He, Yongjun
    Zheng, Tieran
    Zheng, Guibin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12966 - 12978
  • [9] TEDT: Transformer-Based Encoding–Decoding Translation Network for Multimodal Sentiment Analysis
    Fan Wang
    Shengwei Tian
    Long Yu
    Jing Liu
    Junwen Wang
    Kun Li
    Yongtao Wang
    Cognitive Computation, 2023, 15 : 289 - 303
  • [10] Transformer-Based Self-Supervised Monocular Depth and Visual Odometry
    Zhao, Hongru
    Qiao, Xiuquan
    Ma, Yi
    Tafazolli, Rahim
    IEEE SENSORS JOURNAL, 2023, 23 (02) : 1436 - 1446