Self-supervised Implicit Glyph Attention for Text Recognition

被引：13

作者：

Guan, Tongkun ^{[1
]}

Gu, Chaochen ^{[2
]}

Tu, Jingzheng ^{[2
]}

Yang, Xue ^{[1
]}

Feng, Qi ^{[2
]}

Zhao, Yudi ^{[2
]}

Shen, Wei ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

上海市自然科学基金;

关键词：

NETWORK;

D O I：

10.1109/CVPR52729.2023.01467

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The attention mechanism has become the de facto module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and or character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is character category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when handling languages with larger character categories. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.

引用

页码：15285 / 15294

页数：10

共 50 条

[41] Attention-guided mask learning for self-supervised 3D action recognition
Zhang, Haoyuan
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 7487 - 7496
[42] Implicit Self-Supervised Language Representation for Spoken Language Diarization
Mishra, Jagabandhu
Prasanna, S. R. Mahadeva
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3393 - 3407
[43] Self-supervised Shape Completion via Involution and Implicit Correspondences
Gao, Mengya
Chhatkuli, Ajad
Postels, Janis
Van Gool, Luc
Tombari, Federico
COMPUTER VISION - ECCV 2024, PT LVIII, 2025, 15116 : 212 - 229
[44] Facial Emotion Recognition with Inter-Modality-Attention-Transformer-Based Self-Supervised Learning
Chaudhari, Aayushi
Bhatt, Chintan
Krishna, Achyut
Travieso-Gonzalez, Carlos M.
ELECTRONICS, 2023, 12 (02)
[45] Self-supervised random mask attention GAN in tackling pose-invariant face recognition
Liao, Jiashu
Guha, Tanaya
Sanchez, Victor
PATTERN RECOGNITION, 2025, 159
[46] Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *
Ge, Zirui
Xu, Xinzhou
Guo, Haiyan
Wang, Tingting
Yang, Zhen
APPLIED ACOUSTICS, 2024, 219
[47] Weakly Supervised Attention Rectification for Scene Text Recognition
Gu, Chengyu
Wang, Shilin
Zhu, Yiwei
Huang, Zheng
Chen, Kai
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 779 - 786
[48] Self-Supervised Pretraining Improves Self-Supervised Pretraining
Reed, Colorado J.
Yue, Xiangyu
Nrusimha, Ani
Ebrahimi, Sayna
Vijaykumar, Vivek
Mao, Richard
Li, Bo
Zhang, Shanghang
Guillory, Devin
Metzger, Sean
Keutzer, Kurt
Darrell, Trevor
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1050 - 1060
[49] A self-supervised dual-channel self-attention acoustic encoder for underwater acoustic target recognition
Wang, Xingmei
Wu, Peiran
Li, Boquan
Zhan, Ge
Liu, Jinghan
Liu, Zijian
OCEAN ENGINEERING, 2024, 299
[50] Contrastive Self-Supervised Learning for Skeleton Action Recognition
Gao, Xuehao
Yang, Yang
Du, Shaoyi
NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 51 - 61

← 1 2 3 4 5 →