Attention-based multi-modal fusion sarcasm detection

被引:1
|
作者
Liu, Jing [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
Long, Jun [3 ,4 ]
Zhou, Tiejun [5 ]
Wang, Bo [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Xinjiang Univ, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China
[3] Cent South Univ, Sch Informat Sci & Engn, Changsha, Peoples R China
[4] Cent South Univ, Big Data & Knowledge Engn Inst, Changsha, Peoples R China
[5] Xinjiang Internet Informat Ctr, Urumqi, Xinjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal; sarcasm detection; Attention; ViT; D-BiGRU;
D O I
10.3233/JIFS-213501
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm is a way to express the thoughts of a person. The intended meaning of the ideas expressed through sarcasm is often the opposite of the apparent meaning. Previous work on sarcasm detection mainly focused on the text. But nowadays most information is multi-modal, including text and images. Therefore, the task of targeting multi-modal sarcasm detection is becoming an increasingly hot research topic. In order to better detect the accurate meaning of multi-modal sarcasm information, this paper proposed a multi-modal fusion sarcasm detection model based on the attention mechanism, which introduced Vision Transformer (ViT) to extract image features and designed a Double-Layer Bi-Directional Gated Recurrent Unit (D-BiGRU) to extract text features. The features of the two modalities are fused into one feature vector and predicted after attention enhancement. The model presented in this paper gained significant experimental results on the baseline datasets, which are 0.71% and 0.38% higher than that of the best baseline model proposed on F1-score and accuracy respectively.
引用
收藏
页码:2097 / 2108
页数:12
相关论文
共 50 条
  • [1] Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion
    Li, Siqi
    Zou, Changqing
    Li, Yipeng
    Zhao, Xibin
    Gao, Yue
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11402 - 11409
  • [2] ARF-Net: a multi-modal aesthetic attention-based fusion
    Iffath, Fariha
    Gavrilova, Marina
    VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
  • [3] Multi-modal sarcasm detection based on Multi-Channel Enhanced Fusion model
    Fang, Hong
    Liang, Dahao
    Xiang, Weiyu
    NEUROCOMPUTING, 2024, 578
  • [4] Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model
    Cai, Yitao
    Cai, Huiyu
    Wan, Xiaojun
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2506 - 2515
  • [5] A multi-modal sarcasm detection model based on cue learning
    Lu, Ming
    Dong, Zhiqiang
    Guo, Ziming
    Zhang, Xiaoming
    Lu, Xinxi
    Wang, Tianbo
    Zhang, Litian
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [6] An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis
    Zheng, Guowei
    Zheng, Weihao
    Zhang, Yu
    Wang, Junyu
    Chen, Miao
    Wang, Yin
    Cai, Tianhong
    Yao, Zhijun
    Hu, Bin
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (06)
  • [7] Multi-Modal Sarcasm Detection Based on Dual Generative Processes
    Ma, Huiying
    He, Dongxiao
    Wang, Xiaobao
    Jin, Di
    Ge, Meng
    Wang, Longbiao
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2279 - 2287
  • [8] Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection
    Fang, Hong
    Liang, Dahao
    Xiang, Weiyu
    ARRAY, 2024, 22
  • [9] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    IEEE ACCESS, 2024, 12 : 137203 - 137221
  • [10] MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
    Munusamy, Hemalatha
    Sekhar, Chandra C.
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 475 - 479