Modal Contrastive Learning Based End-to-End Text Image Machine Translation

被引:0
|
作者
Ma, Cong [1 ,2 ]
Han, Xu [1 ,2 ]
Wu, Linghui [1 ,2 ]
Zhang, Yaping [1 ,2 ]
Zhao, Yang [1 ,2 ]
Zhou, Yu [1 ,2 ]
Zong, Chengqing [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Machine translation; Decoding; Semantics; Pipelines; Text recognition; Task analysis; Text image machine translation; contrastive learning; text image recognition; machine translation; RECOGNITION;
D O I
10.1109/TASLP.2023.3324540
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text image machine translation (TIMT) aims at directly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End-to-end Text Image Machine Translation (METIMT), which alleviates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.
引用
收藏
页码:2153 / 2165
页数:13
相关论文
共 50 条
  • [31] End-to-end entity-aware neural machine translation
    Shufang Xie
    Yingce Xia
    Lijun Wu
    Yiqing Huang
    Yang Fan
    Tao Qin
    Machine Learning, 2022, 111 : 1181 - 1203
  • [32] End-to-end entity-aware neural machine translation
    Xie, Shufang
    Xia, Yingce
    Wu, Lijun
    Huang, Yiqing
    Fan, Yang
    Qin, Tao
    MACHINE LEARNING, 2022, 111 (03) : 1181 - 1203
  • [33] Deep-learning based end-to-end system for text reading in the wild
    Harizi, Riadh
    Walha, Rim
    Drira, Fadoua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24691 - 24719
  • [34] Deep-learning based end-to-end system for text reading in the wild
    Riadh Harizi
    Rim Walha
    Fadoua Drira
    Multimedia Tools and Applications, 2022, 81 : 24691 - 24719
  • [35] End-to-End Chinese Image Text Recognition with Attention Model
    Sheng, Fenfen
    Zhai, Chuanlei
    Chen, Zhineng
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 180 - 189
  • [36] End-to-End Text-to-Image Synthesis with Spatial Constrains
    Wang, Min
    Lang, Congyan
    Liang, Liqian
    Feng, Songhe
    Wang, Tao
    Gao, Yutong
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (04)
  • [37] End-to-end Learning of Image based Lane-Change Decision
    Jeong, Seong-Gyun
    Kim, Jiwon
    Kim, Sujung
    Min, Jaesik
    2017 28TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV 2017), 2017, : 1602 - 1607
  • [38] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
    Ma, Xutai
    Pino, Juan
    Koehn, Philipp
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
  • [40] Image binarization for end-to-end text understanding in natural images
    Milyaev, Sergey
    Barinova, Olga
    Novikova, Tatiana
    Kohli, Pushmeet
    Lempitsky, Victor
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 128 - 132