TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

被引:20
|
作者
Wang, Fan [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
Liu, Jing [1 ]
Wang, Junwen [1 ]
Li, Kun [1 ]
Wang, Yongtao [1 ]
机构
[1] Univ Xinjiang, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Univ Xinjiang, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention; FUSION;
D O I
10.1007/s12559-022-10073-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding-decoding translation network with a transformer and adopted a joint encoding-decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.
引用
收藏
页码:289 / 303
页数:15
相关论文
共 50 条
  • [1] TEDT: Transformer-Based Encoding–Decoding Translation Network for Multimodal Sentiment Analysis
    Fan Wang
    Shengwei Tian
    Long Yu
    Jing Liu
    Junwen Wang
    Kun Li
    Yongtao Wang
    Cognitive Computation, 2023, 15 : 289 - 303
  • [2] MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis
    Qi, Qingfu
    Lin, Liyuan
    Zhang, Rui
    Xue, Chengrong
    IEEE ACCESS, 2022, 10 : 28750 - 28759
  • [3] Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
    Yuan, Ziqi
    Li, Wei
    Xu, Hua
    Yu, Wenmeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4400 - 4407
  • [4] Transformer-based adaptive contrastive learning for multimodal sentiment analysis
    Hu Y.
    Huang X.
    Wang X.
    Lin H.
    Zhang R.
    Multimedia Tools and Applications, 2025, 84 (3) : 1385 - 1402
  • [5] TMBL: Transformer-based multimodal binding learning model for multimodal sentiment analysis
    Huang, Jiehui
    Zhou, Jun
    Tang, Zhenchao
    Lin, Jiaying
    Chen, Calvin Yu-Chian
    KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [6] Transformer-Based Graph Convolutional Network for Sentiment Analysis
    AlBadani, Barakat
    Shi, Ronghua
    Dong, Jian
    Al-Sabri, Raeed
    Moctard, Oloulade Babatounde
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [7] A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
    Delbrouck, Jean-Benoit
    Tits, Noe
    Brousmiche, Mathilde
    Dupont, Stephane
    PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, 2020, : 1 - 7
  • [8] ENCODING-DECODING OPTICAL FIBER NETWORK
    MAROM, E
    RAMER, OG
    ELECTRONICS LETTERS, 1978, 14 (03) : 48 - 49
  • [9] Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation
    Chen, Cong
    Zong, Qinqin
    Luo, Qi
    Qiu, Bailian
    Li, Maoxi
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 66 - 75
  • [10] Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis
    Wang, Ruiqing
    Yang, Qimeng
    Tian, Shengwei
    Yu, Long
    He, Xiaoyu
    Wang, Bo
    NEUROCOMPUTING, 2025, 618