Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

被引:30
|
作者
Kittenplon, Yair [1 ]
Lavi, Inbal [1 ]
Fogel, Sharon [1 ]
Bar, Yarin [1 ]
Manmatha, R. [1 ]
Perona, Pietro [1 ]
机构
[1] AWS AI Labs, Cambridge, England
关键词
RECOGNITION;
D O I
10.1109/CVPR52688.2022.00456
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (EIS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
引用
收藏
页码:4594 / 4603
页数:10
相关论文
共 50 条
  • [41] Weakly-Supervised Semantic Segmentation Using Motion Cues
    Tokmakov, Pavel
    Alahari, Karteek
    Schmid, Cordelia
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 388 - 404
  • [42] Weakly-supervised Speech-to-text Mapping with Visually Connected Non-parallel Speech-text Data using Cyclic Partially-aligned Transformer
    Effendi, Johanes
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2021, 2021, : 2257 - 2261
  • [43] Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
    Heo, Yu-Jung
    Kim, Eun-Sol
    Choi, Woo Suk
    Zhang, Byoung-Tak
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 373 - 390
  • [44] Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis
    Dang, Quang-Vinh
    Lee, Guee-Sang
    IEEE ACCESS, 2023, 11 : 67791 - 67805
  • [45] Weakly-Supervised Self-Ensembling Vision Transformer for MRI Cardiac Segmentation
    Wang, Ziyang
    Mang, Haodong
    Liu, Yang
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 101 - 102
  • [46] DanceMVP: Self-Supervised Learning for Multi-Task Primitive-Based Dance Performance Assessment via Transformer Text Prompting
    Zhong, Yun
    Demiris, Yiannis
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10270 - 10278
  • [47] Transformer Based Prototype Learning for Weakly-Supervised Histopathology Tissue Semantic Segmentation
    She, Jinwen
    Hu, Yanxu
    Ma, Andy J.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 203 - 215
  • [48] Doppler Image-Based Weakly-Supervised Vascular Ultrasound Segmentation with Transformer
    Ning, Guochen
    Liang, Hanying
    Chen, Fang
    Zhang, Xinran
    Liao, Hongen
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [49] MTCAM: A Novel Weakly-Supervised Audio-Visual Saliency Prediction Model With Multi-Modal Transformer
    Zhu, Dandan
    Zhu, Kun
    Ding, Weiping
    Zhang, Nana
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1756 - 1771
  • [50] SparseMorph: A weakly-supervised lightweight sparse transformer for mono- and multi-modal deformable image registration
    Bai, Xinhao
    Wang, Hongpeng
    Qin, Yanding
    Han, Jianda
    Yu, Ningbo
    Computers in Biology and Medicine, 2024, 182