Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

被引：30

作者：

Kittenplon, Yair ^{[1
]}

Lavi, Inbal ^{[1
]}

Fogel, Sharon ^{[1
]}

Bar, Yarin ^{[1
]}

Manmatha, R. ^{[1
]}

Perona, Pietro ^{[1
]}

机构：

[1] AWS AI Labs, Cambridge, England

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

RECOGNITION;

D O I：

10.1109/CVPR52688.2022.00456

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (EIS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

引用

页码：4594 / 4603

页数：10

共 50 条

[21] DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting
Zhuangzhuang Miao
Yong Zhang
Yuan Peng
Haocheng Peng
Baocai Yin
Computational Visual Media, 2023, 9 : 859 - 873
[22] Multi-Task Weakly Supervised Learning for Origin–Destination Travel Time Estimation
Wang, Hongjun
Zhang, Zhiwen
Fan, Zipei
Chen, Jiyuan
Zhang, Lingyu
Shibasaki, Ryosuke
Song, Xuan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11628 - 11641
[23] Optimizing multi-task network with learned prototypes for weakly supervised semantic segmentation
Zhou, Lei
Wang, Jiasong
Luo, Jing
Guo, Yuheng
Li, Xiaoxiao
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 134
[24] A Weakly Supervised Multi-task Ranking Framework for Actor–Action Semantic Segmentation
Yan Yan
Chenliang Xu
Dawen Cai
Jason J. Corso
International Journal of Computer Vision, 2020, 128 : 1414 - 1432
[25] Boosting Weakly-Supervised Temporal Action Localization with Text Information
Li, Guozhang
Cheng, De
Ding, Xinpeng
Wang, Nannan
Wang, Xiaoyu
Gao, Xinbo
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10648 - 10657
[26] Early risk stratification of ER+/HER2-breast cancer patients using digital pathology and multi-task, weakly-supervised deep learning
Kaczmarzyk, Jakub R.
Torre-Healy, Luke A.
Moffitt, Richard A.
Gupta, Rajarsi
Hamilton, Alina M.
Kurc, Tahsin M.
Hoadley, Katherine A.
Troester, Melissa A.
Saltz, Joel H.
CANCER RESEARCH, 2024, 84 (03)
[27] Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text
Holat, Pierre
Tomeh, Nadi
Charnois, Thierry
Battistelli, Delphine
Jaulent, Marie-Christine
Metivier, Jean-Philippe
ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 192 - 203
[28] Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective
Meng, Yu
Huang, Jiaxin
Zhang, Yu
Zhang, Yunyi
Han, Jiawei
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5817 - 5818
[29] Image segmentation fusion using weakly supervised trace-norm multi-task learning method
Liang, Xianpeng
Huang, De-Shuang
IET IMAGE PROCESSING, 2018, 12 (07) : 1079 - 1085
[30] Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking
Yan, Yan
Xu, Chenliang
Cai, Dawen
Corso, Jason J.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1022 - 1031

← 1 2 3 4 5 →