Self-supervised Implicit Glyph Attention for Text Recognition

被引:13
|
作者
Guan, Tongkun [1 ]
Gu, Chaochen [2 ]
Tu, Jingzheng [2 ]
Yang, Xue [1 ]
Feng, Qi [2 ]
Zhao, Yudi [2 ]
Shen, Wei [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
基金
上海市自然科学基金;
关键词
NETWORK;
D O I
10.1109/CVPR52729.2023.01467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The attention mechanism has become the de facto module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and or character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is character category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when handling languages with larger character categories. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.
引用
收藏
页码:15285 / 15294
页数:10
相关论文
共 50 条
  • [41] Attention-guided mask learning for self-supervised 3D action recognition
    Zhang, Haoyuan
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 7487 - 7496
  • [42] Implicit Self-Supervised Language Representation for Spoken Language Diarization
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3393 - 3407
  • [43] Self-supervised Shape Completion via Involution and Implicit Correspondences
    Gao, Mengya
    Chhatkuli, Ajad
    Postels, Janis
    Van Gool, Luc
    Tombari, Federico
    COMPUTER VISION - ECCV 2024, PT LVIII, 2025, 15116 : 212 - 229
  • [44] Facial Emotion Recognition with Inter-Modality-Attention-Transformer-Based Self-Supervised Learning
    Chaudhari, Aayushi
    Bhatt, Chintan
    Krishna, Achyut
    Travieso-Gonzalez, Carlos M.
    ELECTRONICS, 2023, 12 (02)
  • [45] Self-supervised random mask attention GAN in tackling pose-invariant face recognition
    Liao, Jiashu
    Guha, Tanaya
    Sanchez, Victor
    PATTERN RECOGNITION, 2025, 159
  • [46] Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *
    Ge, Zirui
    Xu, Xinzhou
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    APPLIED ACOUSTICS, 2024, 219
  • [47] Weakly Supervised Attention Rectification for Scene Text Recognition
    Gu, Chengyu
    Wang, Shilin
    Zhu, Yiwei
    Huang, Zheng
    Chen, Kai
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 779 - 786
  • [48] Self-Supervised Pretraining Improves Self-Supervised Pretraining
    Reed, Colorado J.
    Yue, Xiangyu
    Nrusimha, Ani
    Ebrahimi, Sayna
    Vijaykumar, Vivek
    Mao, Richard
    Li, Bo
    Zhang, Shanghang
    Guillory, Devin
    Metzger, Sean
    Keutzer, Kurt
    Darrell, Trevor
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1050 - 1060
  • [49] A self-supervised dual-channel self-attention acoustic encoder for underwater acoustic target recognition
    Wang, Xingmei
    Wu, Peiran
    Li, Boquan
    Zhan, Ge
    Liu, Jinghan
    Liu, Zijian
    OCEAN ENGINEERING, 2024, 299
  • [50] Contrastive Self-Supervised Learning for Skeleton Action Recognition
    Gao, Xuehao
    Yang, Yang
    Du, Shaoyi
    NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 51 - 61