Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

被引:17
|
作者
Li, Zuhe [1 ]
Guo, Qingbing [1 ]
Pan, Yushan [2 ]
Ding, Weiping [3 ]
Yu, Jun [1 ]
Zhang, Yazhou [4 ]
Liu, Weihua [5 ]
Chen, Haoran [1 ]
Wang, Hao [6 ]
Xie, Ying [7 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp & Commun Engn, Zhengzhou 450002, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Dept Comp, Suzhou 215123, Peoples R China
[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[4] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[5] China Mobile Res Inst, Beijing 100053, Peoples R China
[6] Xidian Univ, Xian 710071, Peoples R China
[7] Putian Univ, Putian 351100, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Unimodal feature fusion; Linguistic-guided transformer; Self-supervised label generation; INFORMATION FUSION; MECHANISM; LSTM;
D O I
10.1016/j.inffus.2023.101891
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion-and linguistics guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis
    Wang, Ruiqing
    Yang, Qimeng
    Tian, Shengwei
    Yu, Long
    He, Xiaoyu
    Wang, Bo
    NEUROCOMPUTING, 2025, 618
  • [2] Self-Supervised Unimodal Label Generation Strategy Using Recalibrated Modality Representations for Multimodal Sentiment Analysis
    Hwang, Yewon
    Kim, Jong-Hwan
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 35 - 46
  • [3] Sentiment Knowledge Enhanced Self-supervised Learning for Multimodal Sentiment Analysis
    Qian, Fan
    Han, Jiqing
    He, Yongjun
    Zheng, Tieran
    Zheng, Guibin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12966 - 12978
  • [4] Self-supervised Correlation Mining Network for Person Image Generation
    Wang, Zijian
    Qi, Xingqun
    Yuan, Kun
    Sun, Muyi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7693 - 7702
  • [5] An autoencoder-based self-supervised learning for multimodal sentiment analysis
    Feng, Wenjun
    Wang, Xin
    Cao, Donglin
    Lin, Dazhen
    INFORMATION SCIENCES, 2024, 675
  • [6] Multi-level Contrastive Learning for Self-Supervised Vision Transformers
    Mo, Shentong
    Sun, Zhun
    Li, Chao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2777 - 2786
  • [7] A Multimodal Fake News Detection Model with Self-supervised Unimodal Label Generation
    Liu, Yun
    Wen, Zhipeng
    Jin, Minzhu
    Fan, Daoxin
    Li, Sifan
    Liu, Bo
    Jiang, Jinhe
    Xiao, Xianda
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VIII, ICIC 2024, 2024, 14869 : 130 - 141
  • [8] Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis
    Miao, Xinmeng
    Zhang, Xuguang
    Zhang, Haoran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63291 - 63308
  • [9] Multi-level language interaction transformer for multimodal sentiment analysis
    Li, Yongtai
    Liu, Anzhang
    Lu, Yanlong
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2025,
  • [10] Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis
    Poria, Soujanya
    Cambria, Erik
    Hazarika, Devamanyu
    Mazumder, Navonil
    Zadeh, Amir
    Morency, Louis-Philippe
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 1033 - 1038