TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

被引:20
|
作者
Wang, Fan [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
Liu, Jing [1 ]
Wang, Junwen [1 ]
Li, Kun [1 ]
Wang, Yongtao [1 ]
机构
[1] Univ Xinjiang, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Univ Xinjiang, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention; FUSION;
D O I
10.1007/s12559-022-10073-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding-decoding translation network with a transformer and adopted a joint encoding-decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.
引用
收藏
页码:289 / 303
页数:15
相关论文
共 50 条
  • [31] Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis
    Yu, Jianfei
    Chen, Kai
    Xia, Rui
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1966 - 1978
  • [32] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
    Mandal, Ranju
    Chen, Jinyan
    Becken, Susanne
    Stantic, Bela
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
  • [33] Transformer-based Relation Detect Model for Aspect-based Sentiment Analysis
    Wei, Zixi
    Xu, Xiaofei
    Li, Lijian
    Qin, Kaixin
    Li, Li
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [34] Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach
    Bacco, Luca
    Cimino, Andrea
    Dell'Orletta, Felice
    Merone, Mario
    ELECTRONICS, 2021, 10 (18)
  • [35] A transformer-based deep learning model for Persian moral sentiment analysis
    Karami, Behnam
    Bakouie, Fatemeh
    Gharibzadeh, Shahriar
    JOURNAL OF INFORMATION SCIENCE, 2023,
  • [36] Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild
    Alzamzami, Fatimah
    Saddik, Abdulmotaleb El
    IEEE ACCESS, 2023, 11 : 47070 - 47079
  • [37] Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping
    Li, Zuhe
    Guo, Qingbing
    Feng, Chengyao
    Deng, Lujuan
    Zhang, Qiuwen
    Zhang, Jianwei
    Wang, Fengqin
    Sun, Qian
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [38] Video Review Analysis via Transformer-based Sentiment Change Detection
    Wu, Zilong
    Huang, Siyuan
    Zhang, Rui
    Li, Lin
    THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2020), 2020, : 330 - 335
  • [39] Transformer-Based Model for Auditory EEG Decoding
    Chen, Jiaxin
    Liu, Yin-Long
    Feng, Rui
    Yuan, Jiahong
    Ling, Zhen-Hua
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 129 - 143
  • [40] A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis
    Liu, Cong
    Wang, Yong
    Yang, Jing
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8415 - 8441