Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

被引:0
|
作者
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
机构
[1] Shandong Normal University,School of Information Science and Engineering
[2] Shandong University of Traditional Chinese Medicine,School of Intelligence and Information Engineering
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Multimodal; Transformer; Sentiment detection;
D O I
暂无
中图分类号
学科分类号
摘要
Social media allows users to express opinions in multiple modalities such as text, pictures, and short-videos. Multi-modal sentiment detection can more effectively predict the emotional tendencies expressed by users. Therefore, multi-modal sentiment detection has received extensive attention in recent years. Current works consider utterances from videos as independent modal, ignoring the effective interaction among diffence modalities of a video. To tackle these challenges, we propose transformer-based interactive multi-modal attention network to investigate multi-modal paired attention between multiple modalities and utterances for video sentiment detection. Specifically, we first take a series of utterances as input and use three separate transformer encoders to capture the utterances-level features of each modality. Subsequently, we introduced multimodal paired attention mechanisms to learn the cross-modality information between multiple modalities and utterances. Finally, we inject the cross-modality information into the multi-headed self-attention layer for making final emotion and sentiment classification. Our solutions outperform baseline models on three multi-modal datasets.
引用
收藏
页码:1943 / 1960
页数:17
相关论文
共 50 条
  • [1] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Zhuang, Xuqiang
    Liu, Fangai
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
  • [2] TMIF: transformer-based multi-modal interactive fusion for automatic rumor detection
    Lv, Jiandong
    Wang, Xingang
    Shao, Cuiling
    MULTIMEDIA SYSTEMS, 2022, 29 (5) : 2979 - 2989
  • [3] TMIF: transformer-based multi-modal interactive fusion for automatic rumor detection
    Jiandong Lv
    Xingang Wang
    Cuiling Shao
    Multimedia Systems, 2023, 29 : 2979 - 2989
  • [4] Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism
    Wu, Jun
    Zhu, Tianliang
    Zheng, Xinli
    Wang, Chunzhi
    APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [5] Dual-attention transformer-based hybrid network for multi-modal medical image segmentation
    Zhang, Menghui
    Zhang, Yuchen
    Liu, Shuaibing
    Han, Yahui
    Cao, Honggang
    Qiao, Bingbing
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [6] Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving
    Huang, Zhiyu
    Mo, Xiaoyu
    Lv, Chen
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2605 - 2611
  • [7] A Transformer-based Multi-modal Joint Attention Fusion Model for Molecular Property Prediction
    Wang, Ke
    Zhang, Wei
    Liu, Yong
    Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, 2023, : 4972 - 4974
  • [8] A Multi-Modal Transformer network for action detection
    Korban, Matthew
    Youngs, Peter
    Acton, Scott T.
    PATTERN RECOGNITION, 2023, 142
  • [9] MIA-Net: Multi-Modal Interactive Attention Network for Multi-Modal Affective Analysis
    Li, Shuzhen
    Zhang, Tong
    Chen, Bianna
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 2796 - 2809
  • [10] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
    Ju, Xincheng
    Zhang, Dong
    Li, Junhui
    Zhou, Guodong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520