Multi-modal simultaneous machine translation fusion of image information

被引：1

作者：

Huang, Yan ^{[1
]}

Wanga, Zhanyang ^{[1
]}

Zhang, TianYuan ^{[1
]}

Xu, Chun ^{[2
]}

Lianga, Hui ^{[1
]}

机构：

[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou, Henan, Peoples R China

[2] Xinjiang Univ Finance & Econ, Coll Comp, Urumqi, Xinjiang, Peoples R China

来源：

JOURNAL OF ENGINEERING RESEARCH | 2023年 / 11卷 / 02期

关键词：

Simultaneous translation; Real-time; Surrounding scenes; Multi-modal; Image information;

D O I：

10.1016/j.jer.2023.100085

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Simultaneous translation is to translate a sentence before people finish it, to understand the speaker's intention in real-time. At present, simultaneous machine translation still relies on text-to-text data resources. However, the output information from the encoder side is used for the decoder as the input data recourse in the pure text translation system. This information is only derived from the text content, and the input information is single, causing a shortage of decoding information at the decoder and the vocabulary is missed in translation. The translator will also visually capture the information of the surrounding scenes to assist himself in the translation work, based on this feature, we propose a multi-modal simultaneous machine translation of fusion image information. We extract information from the image, add the information to the decoder side of the translation system, increase the input data resource of the decoder, and help the system improve the translation quality. We use the Multi30K dataset for experimental verification. Compared with the translation system of plain text, the method we propose can translate more complete sentences, richer content, and better translation results.

引用

页数：7

共 50 条

[1] Optimizing Machine Translation Algorithms through Empirical Study of Multi-modal Information Fusion
Zhong Xuewen
2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 1336 - 1341
[2] Image Visual Attention Mechanism-based Global and Local Semantic Information Fusion for Multi-modal English Machine Translation
Zhengzhou Railway Vocational and Technical College, Zhengzhou
450000, China
J. Comput., 2 (37-50): : 37 - 50
[3] Contextual Information Driven Multi-modal Medical Image Fusion
Luo, Xiao-Qing
Zhang, Zhan-Cheng
Zhang, Bao-Cheng
Wu, Xiao-Jun
IETE TECHNICAL REVIEW, 2017, 34 (06) : 598 - 611
[4] Unsupervised Multi-modal Neural Machine Translation
Su, Yuanhang
Fan, Kai
Nguyen Bach
Kuo, C-C Jay
Huang, Fei
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
[5] An error analysis for image-based multi-modal neural machine translation
Calixto, Iacer
Liu, Qun
MACHINE TRANSLATION, 2019, 33 (1-2) : 155 - 177
[6] Multi-modal Image Fusion with KNN Matting
Zhang, Xia
Lin, Hui
Kang, Xudong
Li, Shutao
PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 89 - 96
[7] An overview of multi-modal medical image fusion
Du, Jiao
Li, Weisheng
Lu, Ke
Xiao, Bin
NEUROCOMPUTING, 2016, 215 : 3 - 20
[8] A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yin, Yongjing
Meng, Fandong
Su, Jinsong
Zhou, Chulun
Yang, Zhengyuan
Zhou, Jie
Luo, Jiebo
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3025 - 3035
[9] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
Wang, Yan
Zeng, Yawen
Liang, Junjie
Xing, Xiaofen
Xu, Jin
Xu, Xiangmin
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
[10] Video Pivoting Unsupervised Multi-Modal Machine Translation
Li, Mingjie
Huang, Po-Yao
Chang, Xiaojun
Hu, Junjie
Yang, Yi
Hauptmann, Alex
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932

← 1 2 3 4 5 →