Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis

被引:1
|
作者
Wang, Ruiqing [1 ]
Yang, Qimeng [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
He, Xiaoyu [3 ]
Wang, Bo [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Xinjiang Univ, Network & Informat Ctr, Network, Xinjiang, Peoples R China
[3] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830000, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Transformer; Multimodal fusion; Collaborative learning; FUSION;
D O I
10.1016/j.neucom.2024.129163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Sentiment Analysis (MSA) aims to recognize and understand a speaker's sentiment state by integrating information from natural language, facial expressions, and voice, has gained much attention in recent years. However, modeling multimodal data poses two main challenges: 1) There are potential sentiment correlations between modalities and within contextual contexts, making it difficult to perform deep-level sentiment correlation mining and information fusion; 2) Sentiment information tends to be unevenly distributed across different modalities, posing challenges in fully leveraging information from each modality for collaborative learning. To address the above challenges, we propose CMLG based on correlation mining and label generation. This approach utilizes a Squeeze and Excitation Network (SEN) to recalibrate modality features and employs Transformer-based intra-modal and inter-modal feature extractors to mine the intrinsic connections between different modalities. In addition, we designed a Self-Supervised Label Generation Module (SLGM) that relies on the positive correlation between feature distances and label offsets to generate single-peak labels, and jointly train multi-peak and single-peak tasks to detect sentiment differences. Extensive experiments on three benchmark dataset (MOSI, MOSEI and SIMS) have shown that the above proposed method CMLG achieves excellent results.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Self-supervised Multimodal Graph Convolutional Network for collaborative filtering
    Kim, Sungjune
    Yun, Seongjun
    Lee, Jongwuk
    Chang, Gyusam
    Roh, Wonseok
    Sohn, Dae-Neung
    Lee, Jung-Tae
    Park, Hogun
    Kim, Sangpil
    INFORMATION SCIENCES, 2024, 653
  • [42] Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis
    Miao, Xinmeng
    Zhang, Xuguang
    Zhang, Haoran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63291 - 63308
  • [43] Multimodal Self-supervised Learning for Medical Image Analysis
    Taleb, Aiham
    Lippert, Christoph
    Klein, Tassilo
    Nabi, Moin
    INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2021, 2021, 12729 : 661 - 673
  • [44] Transformer-based Point Cloud Generation Network
    Xu, Rui
    Hui, Le
    Han, Yuehui
    Qian, Jianjun
    Xie, Jin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4169 - 4177
  • [45] An ensemble transformer-based model for Arabic sentiment analysis
    Mohamed, Omar
    Kassem, Aly M. M.
    Ashraf, Ali
    Jamal, Salma
    Mohamed, Ensaf Hussein
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 13 (01)
  • [46] Self-Supervised Image Aesthetic Assessment Based on Transformer
    Jia, Minrui
    Wang, Guangao
    Wang, Zibei
    Yang, Shuai
    Ke, Yongzhen
    Wang, Kai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2025, 24 (01)
  • [47] Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning
    Wang, Chuyang
    Konpang, Jessada
    Sirikham, Adisorn
    Tian, Shasha
    ELECTRONICS, 2025, 14 (01):
  • [48] Sentiment Lexical Strength Enhanced Self-supervised Attention Learning for sentiment analysis
    Wang, Xi
    Fan, Mengmeng
    Kong, Mingming
    Pei, Zheng
    KNOWLEDGE-BASED SYSTEMS, 2022, 252
  • [49] An ensemble transformer-based model for Arabic sentiment analysis
    Omar Mohamed
    Aly M. Kassem
    Ali Ashraf
    Salma Jamal
    Ensaf Hussein Mohamed
    Social Network Analysis and Mining, 13
  • [50] Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
    Gazis, Athanasios
    Karaiskos, Pantelis
    Loukas, Constantinos
    BIOENGINEERING-BASEL, 2022, 9 (12):