Multimodal dual perception fusion framework for multimodal affective analysis

被引:0
|
作者
Lu, Qiang [1 ]
Sun, Xia [1 ]
Long, Yunfei [2 ]
Zhao, Xiaodi [1 ]
Zou, Wang [1 ]
Feng, Jun [1 ]
Wang, Xuxin [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China
[2] Univ Essex, Sch Comp Sci & Elect Engn, Colchester CO4 3SQ, England
关键词
Multimodal sentiment analysis; Sarcasm detection; Fake news detection; Multimodal affective analysis; Multimodal dual perception fusion;
D O I
10.1016/j.inffus.2024.102747
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The misuse of social platforms and the difficulty in regulating post contents have culminated in a surge of negative sentiments, sarcasms, and the rampant spread of fake news. In response, Multimodal sentiment analysis, sarcasm detection and fake news detection based on image and text have attracted considerable attention recently. Due to that these areas share semantic and sentiment features and confront related fusion challenges in deciphering complex human expressions across different modalities, integrating these multimodal classification tasks that share commonalities across different scenarios into a unified framework is expected to simplify research in sentiment analysis, and enhance the effectiveness of classification tasks involving both semantic and sentiment modeling. Therefore, we consider integral components of a broader spectrum of research known as multimodal affective analysis towards semantics and sentiment, and propose a novel multimodal dual perception fusion framework (MDPF). Specifically, MDPF contains three core procedures: (1) Generating bootstrapping language-image Knowledge to enrich origin modality space, and utilizing cross- modal contrastive learning for aligning text and image modalities to understand underlying semantics and interactions. (2) Designing dynamic connective mechanism to adaptively match image-text pairs and jointly employing gaussian-weighted distribution to intensify semantic sequences. (3) Constructing a cross-modal graph to preserve the structured information of both image and text data and share information between modalities, while introducing sentiment knowledge to refine the edge weights of the graph to capture cross- modal sentiment interaction. We evaluate MDPF on three publicly available datasets across three tasks, and the empirical results demonstrate the superiority of our proposed model.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Affective regimes on Wilton Drive: a multimodal analysis
    Motschenbacher, Heiko
    SOCIAL SEMIOTICS, 2023, 33 (01) : 168 - 187
  • [22] Deep Multimodal Learning for Affective Analysis and Retrieval
    Pang, Lei
    Zhu, Shiai
    Ngo, Chong-Wah
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 2008 - 2020
  • [23] A Lightweight Framework for Perception Analysis Based on Multimodal Cognition-Aware Computing
    Qian, Xuesheng
    Qiao, Yihong
    Wang, Mianjie
    Wang, Xinyue
    Chen, Mengfan
    Dai, Weihui
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [24] UniMF: A Unified Multimodal Framework for Multimodal Sentiment Analysis in Missing Modalities and Unaligned Multimodal Sequences
    Huan, Ruohong
    Zhong, Guowei
    Chen, Peng
    Liang, Ronghua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5753 - 5768
  • [25] Multimodal fusion for multimedia analysis: a survey
    Atrey, Pradeep K.
    Hossain, M. Anwar
    El Saddik, Abdulmotaleb
    Kankanhalli, Mohan S.
    MULTIMEDIA SYSTEMS, 2010, 16 (06) : 345 - 379
  • [26] A Simple Analysis of Multimodal Data Fusion
    Cheng, Jiangchang
    Dai, Yinglong
    Yuan, Yao
    Zhu, Hongli
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1472 - 1475
  • [27] Multimodal fusion for multimedia analysis: a survey
    Pradeep K. Atrey
    M. Anwar Hossain
    Abdulmotaleb El Saddik
    Mohan S. Kankanhalli
    Multimedia Systems, 2010, 16 : 345 - 379
  • [28] Dual Memory Fusion for Multimodal Speech Emotion Recognition
    Priyasad, Darshana
    Fernando, Tharindu
    Sridharan, Sridha
    Denman, Simon
    Fookes, Clinton
    INTERSPEECH 2023, 2023, : 4543 - 4547
  • [29] A multiscale neural architecture search framework for multimodal fusion
    Lv, Jindi
    Sun, Yanan
    Ye, Qing
    Feng, Wentao
    Lv, Jiancheng
    INFORMATION SCIENCES, 2024, 679
  • [30] A Framework to Evaluate Fusion Methods for Multimodal Emotion Recognition
    Pena, Diego
    Aguilera, Ana
    Dongo, Irvin
    Heredia, Juanpablo
    Cardinale, Yudith
    IEEE ACCESS, 2023, 11 : 10218 - 10237