Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

被引：30

作者：

Kittenplon, Yair ^{[1
]}

Lavi, Inbal ^{[1
]}

Fogel, Sharon ^{[1
]}

Bar, Yarin ^{[1
]}

Manmatha, R. ^{[1
]}

Perona, Pietro ^{[1
]}

机构：

[1] AWS AI Labs, Cambridge, England

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

RECOGNITION;

D O I：

10.1109/CVPR52688.2022.00456

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (EIS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

引用

页码：4594 / 4603

页数：10

共 50 条

[41] Weakly-Supervised Semantic Segmentation Using Motion Cues
Tokmakov, Pavel
Alahari, Karteek
Schmid, Cordelia
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 388 - 404
[42] Weakly-supervised Speech-to-text Mapping with Visually Connected Non-parallel Speech-text Data using Cyclic Partially-aligned Transformer
Effendi, Johanes
Sakti, Sakriani
Nakamura, Satoshi
INTERSPEECH 2021, 2021, : 2257 - 2261
[43] Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Heo, Yu-Jung
Kim, Eun-Sol
Choi, Woo Suk
Zhang, Byoung-Tak
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 373 - 390
[44] Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis
Dang, Quang-Vinh
Lee, Guee-Sang
IEEE ACCESS, 2023, 11 : 67791 - 67805
[45] Weakly-Supervised Self-Ensembling Vision Transformer for MRI Cardiac Segmentation
Wang, Ziyang
Mang, Haodong
Liu, Yang
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 101 - 102
[46] DanceMVP: Self-Supervised Learning for Multi-Task Primitive-Based Dance Performance Assessment via Transformer Text Prompting
Zhong, Yun
Demiris, Yiannis
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10270 - 10278
[47] Transformer Based Prototype Learning for Weakly-Supervised Histopathology Tissue Semantic Segmentation
She, Jinwen
Hu, Yanxu
Ma, Andy J.
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 203 - 215
[48] Doppler Image-Based Weakly-Supervised Vascular Ultrasound Segmentation with Transformer
Ning, Guochen
Liang, Hanying
Chen, Fang
Zhang, Xinran
Liao, Hongen
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[49] MTCAM: A Novel Weakly-Supervised Audio-Visual Saliency Prediction Model With Multi-Modal Transformer
Zhu, Dandan
Zhu, Kun
Ding, Weiping
Zhang, Nana
Min, Xiongkuo
Zhai, Guangtao
Yang, Xiaokang
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1756 - 1771
[50] SparseMorph: A weakly-supervised lightweight sparse transformer for mono- and multi-modal deformable image registration
Bai, Xinhao
Wang, Hongpeng
Qin, Yanding
Han, Jianda
Yu, Ningbo
Computers in Biology and Medicine, 2024, 182

← 1 2 3 4 5 →