RTNet: An End-to-End Method for Handwritten Text Image Translation

被引:4
|
作者
Su, Tonghua [1 ]
Liu, Shuchen [1 ]
Zhou, Shengjie [1 ]
机构
[1] Harbin Inst Technol, Sch Software, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Text recognition; Image text translation; Handwritten text; End-to-End;
D O I
10.1007/978-3-030-86331-9_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.
引用
收藏
页码:99 / 113
页数:15
相关论文
共 50 条
  • [31] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
    Makhmudov, Fazliddin
    Mukhiddinov, Mukhriddin
    Abdusalomov, Akmalbek
    Avazov, Kuldoshbay
    Khamdamov, Utkir
    Cho, Young Im
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [32] PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition
    Dezhi Peng
    Lianwen Jin
    Yuliang Liu
    Canjie Luo
    Songxuan Lai
    International Journal of Computer Vision, 2022, 130 : 2623 - 2645
  • [33] PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition
    Peng, Dezhi
    Jin, Lianwen
    Liu, Yuliang
    Luo, Canjie
    Lai, Songxuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2623 - 2645
  • [34] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
    Zhao, Jinming
    Yang, Hao
    Shareghi, Ehsan
    Haffari, Gholamreza
    INTERSPEECH 2022, 2022, : 111 - 115
  • [35] LEVERAGING WEAKLY SUPERVISED DATA TO IMPROVE END-TO-END SPEECH-TO-TEXT TRANSLATION
    Jia, Ye
    Johnson, Melvin
    Macherey, Wolfgang
    Weiss, Ron J.
    Cao, Yuan
    Chiu, Chung-Cheng
    Ari, Naveen
    Laurenzo, Stella
    Wu, Yonghui
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7180 - 7184
  • [36] An end-to-end image-text matching approach considering semantic uncertainty
    Tuerhong, Gulanbaier
    Dai, Xin
    Tian, Liwei
    Wushouer, Mairidan
    NEUROCOMPUTING, 2024, 607
  • [37] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
    Sung, Tzu-Wei
    Liu, Jun-You
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
  • [38] Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning
    Xu, Zhenlong
    Zhou, Shuigeng
    Bai, Fan
    Cheng, Zhanzhan
    Niu, Yi
    Pu, Shiliang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7058 - 7065
  • [39] ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
    Zhan, Fangneng
    Lu, Shijian
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2054 - 2063
  • [40] End-to-end image compression method based on perception metric
    Shuai Liu
    Yingcong Huang
    Huoxiang Yang
    Yongsheng Liang
    Wei Liu
    Signal, Image and Video Processing, 2022, 16 : 1803 - 1810