Exploring Better Text Image Translation with Multimodal Codebook

被引:0
|
作者
Lan, Zhibin [1 ,3 ]
Yu, Jiawei [1 ,3 ]
Li, Xiang [2 ]
Zhang, Wen [2 ]
Luan, Jian [2 ]
Wang, Bin [2 ]
Huang, Degen [4 ]
Su, Jinsong [1 ,3 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Xiamen Univ, Key Lab Digital Protect & Intelligent Proc Intang, Minist Culture & Tourism, Xiamen, Peoples R China
[4] Dalian Univ Technol, Dalian, Peoples R China
来源
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1 | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework.1
引用
收藏
页码:3479 / 3491
页数:13
相关论文
共 50 条
  • [21] Multimodal pragmatics and translation: a new model for source text analysis
    Marais, Kobus
    TRANSLATOR, 2020, 26 (01): : 109 - 112
  • [22] Multimodal pragmatics and translation: a new model for source text analysis
    Li Mi
    Wei Jin
    Mo Aiping
    PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 2020, 28 (02): : 322 - 324
  • [23] Multimodal pragmatics and translation: A new model for source text analysis
    Yuan, Xinhua
    INTERCULTURAL PRAGMATICS, 2021, 18 (01) : 127 - 130
  • [24] Multimodal Pragmatics and Translation: A New Model for Source Text Analysis
    Tan, Hua
    JOURNAL OF SPECIALISED TRANSLATION, 2019, (32): : 291 - 293
  • [25] Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task
    Ma, Cong
    Zhang, Yaping
    Tu, Mei
    Han, Xu
    Wu, Linghui
    Zhao, Yang
    Zhou, Yu
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1664 - 1670
  • [26] Novel Approach for Image Text Recognition and Translation
    Komanduri, Srinandan
    Roopa, Y. Mohana
    Bala, M. Madhu
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 596 - 599
  • [27] LOSSLESS CODING OF MULTIMODAL IMAGE PAIRS BASED ON IMAGE-TO-IMAGE TRANSLATION
    Parracho, Joao O.
    Thomaz, Lucas A.
    Tavora, Luis M. N.
    Assuncao, Pedro A. A.
    Faria, Sergio M. M.
    2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
  • [28] Is image-to-image translation the panacea for multimodal image registration? A comparative study
    Lu, Jiahao
    Ofverstedt, Johan
    Lindblad, Joakim
    Sladoje, Natasa
    PLOS ONE, 2022, 17 (11):
  • [29] Exploring Newmark's Communicative Translation and Text Typology
    Zheng, Wang
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE, EDUCATION AND HUMANITIES RESEARCH (SSEHR 2017), 2017, 185 : 628 - 630
  • [30] Latent Filter Scaling for Multimodal Unsupervised Image-to-Image Translation
    Alharbi, Yazeed
    Smith, Neil
    Wonka, Peter
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1458 - 1466