Text-image matching for multi-model machine translation

被引:3
|
作者
Shi, Xiayang [1 ]
Yu, Zhenqiang [2 ]
Wang, Xuhui [3 ]
Li, Yijun [3 ]
Niu, Yufeng [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Math & Informat Sci, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[3] Inst Stand Measurement ShanXi Prov, Inspection & Testing Ctr ShanXi Prov, Changzhi Rd, Taiyuan 030000, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 16期
关键词
Multi-modal; Text-Image Matching; Similarity; Machine translation;
D O I
10.1007/s11227-023-05318-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal machine translation (MMT) aims to use other modal information to assist text machine translation and to obtain higher quality translation results. Many studies have proved that image information can improve the quality of text machine translation. However, the multi-modal data corpus used in the translation process needs a lot of manual annotation, which makes it difficult to label the corpus, and the scarcity of data sets affects the work of multi-modal machine translation to a certain extent. To solve the problem of text-image annotation, we propose a text-image similarity matching method. This method encodes the text and image, maps them to vector space, and uses cosine similarity to obtain the image with the greatest similarity to the text to construct a multi-modal dataset. We conducted experiments on the Multi30K English German text-only corpus and the WMT21 English Hindi text-only corpus, and the experimental results showed that our method improved 8.4 BLEU compared to the text-only translation results on the Multi30K corpus. Compared with manually annotated multi-modal datasets, our method improves 4.2 BLEU. At the same time, it has improved 3.4 BLEU on low resource corpus English-Hindi, so our method can effectively improve the construction of multi-modal machine translation data sets, and to some extent, improve the development of multi-modal machine translation research.
引用
收藏
页码:17810 / 17823
页数:14
相关论文
共 50 条
  • [41] A Multi-model Biometric Image Acquisition System
    Zhang, Haoxiang
    BIOMETRIC RECOGNITION, CCBR 2015, 2015, 9428 : 516 - 525
  • [42] A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection
    Seo, Jangwon
    Hwang, Hyo-Seok
    Lee, Jiyoung
    Lee, Minhyeok
    Kim, Wonsuk
    Seok, Junhee
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1312 - 1316
  • [43] Image to Text Translation by Multi-Label Classification
    Nasierding, Gulisong
    Kouzani, Abbas Z.
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2010, 6216 : 247 - +
  • [44] THE TEXT-IMAGE RELATIONSHIP IN VERBETES OF AN ENGLISH LANGUAGE DICTIONARY
    de Lima, Edmar Peixoto
    Araujo, Edna M. Vasconcelos M.
    Pontes, Antonio Luciano
    DIALOGO DAS LETRAS, 2016, 5 (02): : 51 - 67
  • [45] Text-image Alignment for Diffusion-based Perception
    Kondapanenil, Neehar
    Marksl, Markus
    Knott, Manuel
    Guimaraes, Rogerio
    Perona, Pietro
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13883 - 13893
  • [46] RELEVANCE AND MEANING OF TEXT-IMAGE INTERRELATION IN THE DECIMONONIC LITERATURE
    Baquero Escudero, Ana L.
    MONTEAGUDO, 2012, (17): : 183 - 188
  • [47] Text-Image Theory: A New Approach to Literary Semiotics
    Yuping, Li
    FORUM FOR WORLD LITERATURE STUDIES, 2022, 14 (02): : 357 - 365
  • [48] Experiences in evaluating multilingual and text-image information retrieval
    Garcia-Serrano, Ana M.
    Martinez-Fernandez, Jose L.
    Martinez, Paloma
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2006, 21 (07) : 655 - 677
  • [49] A Learning to Rank framework applied to text-image retrieval
    David Buffoni
    Sabrina Tollari
    Patrick Gallinari
    Multimedia Tools and Applications, 2012, 60 : 161 - 180
  • [50] A Learning to Rank framework applied to text-image retrieval
    Buffoni, David
    Tollari, Sabrina
    Gallinari, Patrick
    MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 60 (01) : 161 - 180