Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis

被引:17
|
作者
Li, Zuhe [1 ]
Guo, Qingbing [1 ]
Pan, Yushan [2 ]
Ding, Weiping [3 ]
Yu, Jun [1 ]
Zhang, Yazhou [4 ]
Liu, Weihua [5 ]
Chen, Haoran [1 ]
Wang, Hao [6 ]
Xie, Ying [7 ]
机构
[1] Zhengzhou Univ Light Ind, Sch Comp & Commun Engn, Zhengzhou 450002, Peoples R China
[2] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Dept Comp, Suzhou 215123, Peoples R China
[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[4] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[5] China Mobile Res Inst, Beijing 100053, Peoples R China
[6] Xidian Univ, Xian 710071, Peoples R China
[7] Putian Univ, Putian 351100, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Unimodal feature fusion; Linguistic-guided transformer; Self-supervised label generation; INFORMATION FUSION; MECHANISM; LSTM;
D O I
10.1016/j.inffus.2023.101891
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fusion and co-learning are major challenges in multimodal sentiment analysis. Most existing methods either ignore the basic relationships among modalities or fail to maximize their potential correlations. They also do not leverage the knowledge from resource-rich modalities in the analysis of resource-poor modalities. To address these challenges, we propose a multimodal sentiment analysis method based on multilevel correlation mining and self-supervised multi-task learning. First, we propose a unimodal feature fusion-and linguistics guided Transformer-based framework, multi-level correlation mining framework, to overcome the difficulty of multimodal information fusion. The module exploits the correlation information between modalities from low to high levels. Second, we divided the multimodal sentiment analysis task into one multimodal task and three unimodal tasks (linguistic, acoustic, and visual tasks), and designed a self-supervised label generation module (SLGM) to generate sentiment labels for unimodal tasks. SLGM-based multi-task learning overcomes the lack of unimodal labels in co-learning. Through extensive experiments on the CMU-MOSI and CMU-MOSEI datasets, we demonstrated the superiority of the proposed multi-level correlation mining framework to state-of-the-art methods.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Multi-Label Self-Supervised Learning with Scene Images
    Zhu, Ke
    Fu, Minghao
    Wu, Jianxin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6671 - 6680
  • [32] Generalized Semantic Segmentation by Self-Supervised Source Domain Projection and Multi-Level Contrastive Learning
    Yang, Liwei
    Gu, Xiang
    Sun, Jian
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10789 - 10797
  • [33] Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
    Tewari, Ayush
    Zollhofer, Michael
    Garrido, Pablo
    Bernard, Florian
    Kim, Hyeongwoo
    Perez, Patrick
    Theobalt, Christian
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2549 - 2559
  • [34] MaCon: A Generic Self-Supervised Framework for Unsupervised Multimodal Change Detection
    Wang, Jian
    Yan, Li
    Yang, Jianbing
    Xie, Hong
    Yuan, Qiangqiang
    Wei, Pengcheng
    Gao, Zhao
    Zhang, Ce
    Atkinson, Peter M.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1485 - 1500
  • [35] Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
    Sun Q.
    Wang S.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 588 - 601
  • [36] Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond
    Hsieh, Cheng-Yen
    Chang, Chih-Jung
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2695 - 2704
  • [37] Multi-level fusion with deep neural networks for multimodal sentiment classification
    Zhang Guangwei
    Zhao Bing
    Li Ruifan
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2022, 29 (03) : 25 - 33
  • [38] Multimodal sentiment analysis with unimodal label generation and modality decomposition
    Zhu, Linan
    Zhao, Hongyan
    Zhu, Zhechao
    Zhang, Chenwei
    Kong, Xiangjie
    INFORMATION FUSION, 2025, 116
  • [39] Self-supervised sub-category exploration for Pseudo label generation
    Chern, Wei-Chih
    Kim, Taegeon
    Nguyen, Tam, V
    Asari, Vijayan K.
    Kim, Hongjo
    AUTOMATION IN CONSTRUCTION, 2023, 151
  • [40] Multimodal Sentiment Analysis via Low-Rank Tensor Attention Network with Unimodal Self-Supervised Learning
    Pan, Jie (panjie@sdnu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.