Text-image matching for multi-model machine translation

被引:3
|
作者
Shi, Xiayang [1 ]
Yu, Zhenqiang [2 ]
Wang, Xuhui [3 ]
Li, Yijun [3 ]
Niu, Yufeng [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Math & Informat Sci, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[3] Inst Stand Measurement ShanXi Prov, Inspection & Testing Ctr ShanXi Prov, Changzhi Rd, Taiyuan 030000, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 16期
关键词
Multi-modal; Text-Image Matching; Similarity; Machine translation;
D O I
10.1007/s11227-023-05318-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal machine translation (MMT) aims to use other modal information to assist text machine translation and to obtain higher quality translation results. Many studies have proved that image information can improve the quality of text machine translation. However, the multi-modal data corpus used in the translation process needs a lot of manual annotation, which makes it difficult to label the corpus, and the scarcity of data sets affects the work of multi-modal machine translation to a certain extent. To solve the problem of text-image annotation, we propose a text-image similarity matching method. This method encodes the text and image, maps them to vector space, and uses cosine similarity to obtain the image with the greatest similarity to the text to construct a multi-modal dataset. We conducted experiments on the Multi30K English German text-only corpus and the WMT21 English Hindi text-only corpus, and the experimental results showed that our method improved 8.4 BLEU compared to the text-only translation results on the Multi30K corpus. Compared with manually annotated multi-modal datasets, our method improves 4.2 BLEU. At the same time, it has improved 3.4 BLEU on low resource corpus English-Hindi, so our method can effectively improve the construction of multi-modal machine translation data sets, and to some extent, improve the development of multi-modal machine translation research.
引用
收藏
页码:17810 / 17823
页数:14
相关论文
共 50 条
  • [21] Text-Image Retrieval With Salient Features
    Feng, Xia
    Hu, Zhiyi
    Liu, Caihua
    Ip, W. H.
    Chen, Huiying
    JOURNAL OF DATABASE MANAGEMENT, 2021, 32 (04) : 1 - 13
  • [22] Multi-Model Semantic Interaction for Text Analytics
    Bradel, Lauren
    North, Chris
    House, Leanna
    Leman, Scotland
    2014 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2014, : 163 - 172
  • [23] Text-image multimodal fusion model for enhanced fake news detection
    Lin, Szu-Yin
    Chen, Yen-Chiu
    Chang, Yu-Han
    Lo, Shih-Hsin
    Chao, Kuo-Ming
    SCIENCE PROGRESS, 2024, 107 (04)
  • [24] Text-image coupling for editing literary sources
    Lecolinet E.
    Robert L.
    Role F.
    Computers and the Humanities, 2002, 36 (1): : 49 - 73
  • [25] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [26] Multi-model SAR image despeckling
    Wang, C
    Wang, RS
    ELECTRONICS LETTERS, 2002, 38 (23) : 1425 - 1426
  • [27] Recursive Projection Profiling for Text-Image Separation
    Krishnamoorthy, Shivsubramani
    Loganathan, R.
    Soman, K. P.
    INNOVATIONS IN COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2010, : 1 - 5
  • [28] Text-image coupling for editing literary sources
    Lecolinet, E
    Robert, L
    Role, F
    COMPUTERS AND THE HUMANITIES, 2001, 36 (01): : 49 - 73
  • [29] A Machine Learning-Based Approach to Automatic Multi-Model History Matching and Dynamic Prediction
    Feng, Guoqing
    Mo, Haishuai
    Wu, Baofeng
    He, Yujun
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2025,
  • [30] A performance-driven hybrid text-image classification model for multimodal data
    Gupta, Swati
    Kishan, Bal
    SCIENTIFIC REPORTS, 2025, 15 (01):