Text-image matching for multi-model machine translation

被引:3
|
作者
Shi, Xiayang [1 ]
Yu, Zhenqiang [2 ]
Wang, Xuhui [3 ]
Li, Yijun [3 ]
Niu, Yufeng [3 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Math & Informat Sci, Dongfeng Rd, Zhengzhou 450003, Peoples R China
[3] Inst Stand Measurement ShanXi Prov, Inspection & Testing Ctr ShanXi Prov, Changzhi Rd, Taiyuan 030000, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 16期
关键词
Multi-modal; Text-Image Matching; Similarity; Machine translation;
D O I
10.1007/s11227-023-05318-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal machine translation (MMT) aims to use other modal information to assist text machine translation and to obtain higher quality translation results. Many studies have proved that image information can improve the quality of text machine translation. However, the multi-modal data corpus used in the translation process needs a lot of manual annotation, which makes it difficult to label the corpus, and the scarcity of data sets affects the work of multi-modal machine translation to a certain extent. To solve the problem of text-image annotation, we propose a text-image similarity matching method. This method encodes the text and image, maps them to vector space, and uses cosine similarity to obtain the image with the greatest similarity to the text to construct a multi-modal dataset. We conducted experiments on the Multi30K English German text-only corpus and the WMT21 English Hindi text-only corpus, and the experimental results showed that our method improved 8.4 BLEU compared to the text-only translation results on the Multi30K corpus. Compared with manually annotated multi-modal datasets, our method improves 4.2 BLEU. At the same time, it has improved 3.4 BLEU on low resource corpus English-Hindi, so our method can effectively improve the construction of multi-modal machine translation data sets, and to some extent, improve the development of multi-modal machine translation research.
引用
收藏
页码:17810 / 17823
页数:14
相关论文
共 50 条
  • [31] Data model descriptions and translation signatures in a multi-model framework
    Paolo Atzeni
    Giorgio Gianforme
    Paolo Cappellari
    Annals of Mathematics and Artificial Intelligence, 2011, 63 : 287 - 315
  • [32] Text-image models of Old Indic poetry
    Polome, EC
    JOURNAL OF INDO-EUROPEAN STUDIES, 1997, 25 (3-4): : 446 - 446
  • [33] Data model descriptions and translation signatures in a multi-model framework
    Atzeni, Paolo
    Gianforme, Giorgio
    Cappellari, Paolo
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 63 (3-4) : 287 - 315
  • [34] Modeling by Clipped Furniture Parts: Design with Text-Image Model with Stability Understanding
    Yoshida, Hironori
    Itoh, Seiji
    PROCEEDINGS OF THE 2024 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2024, 2024, : 294 - 299
  • [35] News Image Annotation on a Large Parallel Text-Image Corpus
    Tirilly, Pierre
    Claveau, Vincent
    Gros, Patrick
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [36] A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing
    Yuan, Zhiqiang
    Zhang, Wenkai
    Rong, Xuee
    Li, Xuan
    Chen, Jialiang
    Wang, Hongqi
    Fu, Kun
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [37] Abstractive Text-Image Summarization Using Multi-Modal Attentional Hierarchical RNN
    Chen, Jingqiang
    Hai Zhuge
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4046 - 4056
  • [38] Multi-model fused framework for image annotation
    Chen, Z. (jingzhang@ecust.edu.cn), 1600, Institute of Computing Technology (26):
  • [39] MULTI-MODEL PREDICTION FOR IMAGE SET COMPRESSION
    Shi, Zhongbo
    Sun, Xiaoyan
    Wu, Feng
    2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,
  • [40] Multi-model neural network for image classification
    Machado, RJ
    Neves, PECSA
    II WORKSHOP ON CYBERNETIC VISION, PROCEEDINGS, 1997, : 57 - 59