Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

被引:0
|
作者
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
机构
[1] Shandong Normal University,School of Information Science and Engineering
[2] Shandong University of Traditional Chinese Medicine,School of Intelligence and Information Engineering
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Multimodal; Transformer; Sentiment detection;
D O I
暂无
中图分类号
学科分类号
摘要
Social media allows users to express opinions in multiple modalities such as text, pictures, and short-videos. Multi-modal sentiment detection can more effectively predict the emotional tendencies expressed by users. Therefore, multi-modal sentiment detection has received extensive attention in recent years. Current works consider utterances from videos as independent modal, ignoring the effective interaction among diffence modalities of a video. To tackle these challenges, we propose transformer-based interactive multi-modal attention network to investigate multi-modal paired attention between multiple modalities and utterances for video sentiment detection. Specifically, we first take a series of utterances as input and use three separate transformer encoders to capture the utterances-level features of each modality. Subsequently, we introduced multimodal paired attention mechanisms to learn the cross-modality information between multiple modalities and utterances. Finally, we inject the cross-modality information into the multi-headed self-attention layer for making final emotion and sentiment classification. Our solutions outperform baseline models on three multi-modal datasets.
引用
收藏
页码:1943 / 1960
页数:17
相关论文
共 50 条
  • [31] Transformer-Based Attention Network for In-Vehicle Intrusion Detection
    Nguyen, Trieu Phong
    Nam, Heungwoo
    Kim, Daehee
    IEEE ACCESS, 2023, 11 : 55389 - 55403
  • [32] Continuous Multi-modal Emotion Prediction in Video based on Recurrent Neural Network Variants with Attention
    Raju, Joyal
    Gaus, Yona Falinie A.
    Breckon, Toby P.
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 688 - 693
  • [33] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
    Hoffmann, David Sebastian
    Clasen, Kai Norman
    Demir, Begum
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
  • [34] MCT-VHD: Multi-modal contrastive transformer for video highlight detection
    Jiang, Yinhui
    Luo, Sihui
    Guo, Lijun
    Zhang, Rong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [35] A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery
    Liu, Yutong
    Gao, Kun
    Wang, Hong
    Yang, Zhijia
    Wang, Pengyu
    Ji, Shijing
    Huang, Yanjun
    Zhu, Zhenyu
    Zhao, Xiaobin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
  • [36] Graph Interactive Network with Adaptive Gradient for Multi-Modal Rumor Detection
    Sun, Tiening
    Qian, Zhong
    Li, Peifeng
    Zhu, Qiaoming
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 316 - 324
  • [37] On Pursuit of Designing Multi-modal Transformer for Video Grounding
    Cao, Meng
    Chen, Long
    Shou, Zheng
    Zhang, Can
    Zou, Yuexian
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9810 - 9823
  • [38] Multi-Modal Adversarial Example Detection with Transformer
    Ding, Chaoyue
    Sun, Shiliang
    Zhao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] Multi-modal transformer for fake news detection
    Yang, Pingping
    Ma, Jiachen
    Liu, Yong
    Liu, Meng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14699 - 14717
  • [40] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
    Munusamy, Hemalatha
    Sekhar, Chandra C.
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479