TEDT: Transformer-Based Encoding-Decoding Translation Network for Multimodal Sentiment Analysis

被引：20

作者：

Wang, Fan ^{[1
]}

Tian, Shengwei ^{[1
]}

Yu, Long ^{[2
]}

Liu, Jing ^{[1
]}

Wang, Junwen ^{[1
]}

Li, Kun ^{[1
]}

Wang, Yongtao ^{[1
]}

机构：

[1] Univ Xinjiang, Sch Software, Urumqi, Xinjiang, Peoples R China

[2] Univ Xinjiang, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China

来源：

COGNITIVE COMPUTATION | 2023年 / 15卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention; FUSION;

D O I：

10.1007/s12559-022-10073-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding-decoding translation network with a transformer and adopted a joint encoding-decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.

引用

页码：289 / 303

页数：15

共 50 条

[1] TEDT: Transformer-Based Encoding–Decoding Translation Network for Multimodal Sentiment Analysis
Fan Wang
Shengwei Tian
Long Yu
Jing Liu
Junwen Wang
Kun Li
Yongtao Wang
Cognitive Computation, 2023, 15 : 289 - 303
[2] MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis
Qi, Qingfu
Lin, Liyuan
Zhang, Rui
Xue, Chengrong
IEEE ACCESS, 2022, 10 : 28750 - 28759
[3] Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis
Yuan, Ziqi
Li, Wei
Xu, Hua
Yu, Wenmeng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4400 - 4407
[4] Transformer-based adaptive contrastive learning for multimodal sentiment analysis
Hu Y.
Huang X.
Wang X.
Lin H.
Zhang R.
Multimedia Tools and Applications, 2025, 84 (3) : 1385 - 1402
[5] TMBL: Transformer-based multimodal binding learning model for multimodal sentiment analysis
Huang, Jiehui
Zhou, Jun
Tang, Zhenchao
Lin, Jiaying
Chen, Calvin Yu-Chian
KNOWLEDGE-BASED SYSTEMS, 2024, 285
[6] Transformer-Based Graph Convolutional Network for Sentiment Analysis
AlBadani, Barakat
Shi, Ronghua
Dong, Jian
Al-Sabri, Raeed
Moctard, Oloulade Babatounde
APPLIED SCIENCES-BASEL, 2022, 12 (03):
[7] A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis
Delbrouck, Jean-Benoit
Tits, Noe
Brousmiche, Mathilde
Dupont, Stephane
PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, 2020, : 1 - 7
[8] ENCODING-DECODING OPTICAL FIBER NETWORK
MAROM, E
RAMER, OG
ELECTRONICS LETTERS, 1978, 14 (03) : 48 - 49
[9] Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation
Chen, Cong
Zong, Qinqin
Luo, Qi
Qiu, Bailian
Li, Maoxi
MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 66 - 75
[10] Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis
Wang, Ruiqing
Yang, Qimeng
Tian, Shengwei
Yu, Long
He, Xiaoyu
Wang, Bo
NEUROCOMPUTING, 2025, 618

← 1 2 3 4 5 →