Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

被引:30
|
作者
Kittenplon, Yair [1 ]
Lavi, Inbal [1 ]
Fogel, Sharon [1 ]
Bar, Yarin [1 ]
Manmatha, R. [1 ]
Perona, Pietro [1 ]
机构
[1] AWS AI Labs, Cambridge, England
关键词
RECOGNITION;
D O I
10.1109/CVPR52688.2022.00456
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (EIS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
引用
收藏
页码:4594 / 4603
页数:10
相关论文
共 50 条
  • [21] DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting
    Zhuangzhuang Miao
    Yong Zhang
    Yuan Peng
    Haocheng Peng
    Baocai Yin
    Computational Visual Media, 2023, 9 : 859 - 873
  • [22] Multi-Task Weakly Supervised Learning for Origin–Destination Travel Time Estimation
    Wang, Hongjun
    Zhang, Zhiwen
    Fan, Zipei
    Chen, Jiyuan
    Zhang, Lingyu
    Shibasaki, Ryosuke
    Song, Xuan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11628 - 11641
  • [23] Optimizing multi-task network with learned prototypes for weakly supervised semantic segmentation
    Zhou, Lei
    Wang, Jiasong
    Luo, Jing
    Guo, Yuheng
    Li, Xiaoxiao
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 134
  • [24] A Weakly Supervised Multi-task Ranking Framework for Actor–Action Semantic Segmentation
    Yan Yan
    Chenliang Xu
    Dawen Cai
    Jason J. Corso
    International Journal of Computer Vision, 2020, 128 : 1414 - 1432
  • [25] Boosting Weakly-Supervised Temporal Action Localization with Text Information
    Li, Guozhang
    Cheng, De
    Ding, Xinpeng
    Wang, Nannan
    Wang, Xiaoyu
    Gao, Xinbo
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10648 - 10657
  • [26] Early risk stratification of ER+/HER2-breast cancer patients using digital pathology and multi-task, weakly-supervised deep learning
    Kaczmarzyk, Jakub R.
    Torre-Healy, Luke A.
    Moffitt, Richard A.
    Gupta, Rajarsi
    Hamilton, Alina M.
    Kurc, Tahsin M.
    Hoadley, Katherine A.
    Troester, Melissa A.
    Saltz, Joel H.
    CANCER RESEARCH, 2024, 84 (03)
  • [27] Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text
    Holat, Pierre
    Tomeh, Nadi
    Charnois, Thierry
    Battistelli, Delphine
    Jaulent, Marie-Christine
    Metivier, Jean-Philippe
    ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 192 - 203
  • [28] Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective
    Meng, Yu
    Huang, Jiaxin
    Zhang, Yu
    Zhang, Yunyi
    Han, Jiawei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5817 - 5818
  • [29] Image segmentation fusion using weakly supervised trace-norm multi-task learning method
    Liang, Xianpeng
    Huang, De-Shuang
    IET IMAGE PROCESSING, 2018, 12 (07) : 1079 - 1085
  • [30] Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking
    Yan, Yan
    Xu, Chenliang
    Cai, Dawen
    Corso, Jason J.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1022 - 1031