Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

被引:0
|
作者
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
机构
[1] Shandong Normal University,School of Information Science and Engineering
[2] Shandong University of Traditional Chinese Medicine,School of Intelligence and Information Engineering
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Multimodal; Transformer; Sentiment detection;
D O I
暂无
中图分类号
学科分类号
摘要
Social media allows users to express opinions in multiple modalities such as text, pictures, and short-videos. Multi-modal sentiment detection can more effectively predict the emotional tendencies expressed by users. Therefore, multi-modal sentiment detection has received extensive attention in recent years. Current works consider utterances from videos as independent modal, ignoring the effective interaction among diffence modalities of a video. To tackle these challenges, we propose transformer-based interactive multi-modal attention network to investigate multi-modal paired attention between multiple modalities and utterances for video sentiment detection. Specifically, we first take a series of utterances as input and use three separate transformer encoders to capture the utterances-level features of each modality. Subsequently, we introduced multimodal paired attention mechanisms to learn the cross-modality information between multiple modalities and utterances. Finally, we inject the cross-modality information into the multi-headed self-attention layer for making final emotion and sentiment classification. Our solutions outperform baseline models on three multi-modal datasets.
引用
收藏
页码:1943 / 1960
页数:17
相关论文
共 50 条
  • [41] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
    Zhou, You
    Yang, Gang
    Zhou, Yang
    Ding, Dayong
    Zhao, Jianchun
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
  • [42] Multi-modal Interactive Video Retrieval with Temporal Queries
    Heller, Silvan
    Arnold, Rahel
    Gasser, Ralph
    Gsteiger, Viktor
    Parian-Scherb, Mahnaz
    Rossetto, Luca
    Sauter, Loris
    Spiess, Florian
    Schuldt, Heiko
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 493 - 498
  • [43] Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting
    Fu, Fengyi
    Fang, Shancheng
    Chen, Weidong
    Mao, Zhendong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [44] Hierarchical Multi-modal Contextual Attention Network for Fake News Detection
    Qian, Shengsheng
    Wang, Jinguang
    Hu, Jun
    Fang, Quan
    Xu, Changsheng
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 153 - 162
  • [45] M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition
    Zhang, Yazhou
    Jia, Ao
    Wang, Bo
    Zhang, Peng
    Zhao, Dongming
    Li, Pu
    Hou, Yuexian
    Jin, Xiaojia
    Song, Dawei
    Qin, Jing
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [46] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [47] Multi-modal Sentiment Feature Learning Based on Sentiment Signal
    Lin, Dazhen
    Li, Lingxiao
    Cao, Donglin
    Li, Shaozi
    12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017), 2017, : 33 - 40
  • [48] MITDCNN: A multi-modal input Transformer-based deep convolutional neural network for misfire signal detection in high-noise diesel engines
    Li, Wenjie
    Liu, Xiangpeng
    Wang, Danning
    Lu, Wei
    Yuan, Bo
    Qin, Chengjin
    Cheng, Yuhua
    Caleanu, Catalin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [49] Multi-Modal Sarcasm Detection with Sentiment Word Embedding
    Fu, Hao
    Liu, Hao
    Wang, Hongling
    Xu, Linyan
    Lin, Jiali
    Jiang, Dazhi
    ELECTRONICS, 2024, 13 (05)
  • [50] Transformer-Based Graph Convolutional Network for Sentiment Analysis
    AlBadani, Barakat
    Shi, Ronghua
    Dong, Jian
    Al-Sabri, Raeed
    Moctard, Oloulade Babatounde
    APPLIED SCIENCES-BASEL, 2022, 12 (03):