Self-supervised Implicit Glyph Attention for Text Recognition

被引:13
|
作者
Guan, Tongkun [1 ]
Gu, Chaochen [2 ]
Tu, Jingzheng [2 ]
Yang, Xue [1 ]
Feng, Qi [2 ]
Zhao, Yudi [2 ]
Shen, Wei [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai, Peoples R China
基金
上海市自然科学基金;
关键词
NETWORK;
D O I
10.1109/CVPR52729.2023.01467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The attention mechanism has become the de facto module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and or character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is character category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when handling languages with larger character categories. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.
引用
收藏
页码:15285 / 15294
页数:10
相关论文
共 50 条
  • [21] SoGAR: Self-Supervised Spatiotemporal Attention-Based Social Group Activity Recognition
    Chappa, Naga V. S. Raviteja
    Nguyen, Pha
    Nelson, Alexander H.
    Seo, Han-Seok
    Li, Xin
    Dobbs, Page Daniel
    Luu, Khoa
    IEEE ACCESS, 2025, 13 : 33631 - 33642
  • [22] Text-to-image generation method based on self-supervised attention and image features fusion
    Liao, Yonghui
    Zhang, Haitao
    Jin, Haibo
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (02) : 180 - 191
  • [23] Supervised and Self-Supervised Learning for Assembly Line Action Recognition
    Indris, Christopher
    Ibrahim, Fady
    Ibrahem, Hatem
    Bramesfeld, Gotz
    Huo, Jie
    Ahmad, Hafiz Mughees
    Hayat, Syed Khizer
    Wang, Guanghui
    JOURNAL OF IMAGING, 2025, 11 (01)
  • [24] Understanding Self-Attention of Self-Supervised Audio Transformers
    Yang, Shu-wen
    Liu, Andy T.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3785 - 3789
  • [25] ON THE CONVERGENCE OF A SELF-SUPERVISED VOWEL RECOGNITION SYSTEM
    PATHAK, A
    PAL, SK
    PATTERN RECOGNITION, 1987, 20 (02) : 237 - 244
  • [26] Self-supervised extractive text summarization for biomedical literatures
    Xie, Tianyi
    Zhen, Yi
    Li, Tianqi
    Li, Chuqin
    Ge, Yaorong
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 503 - 504
  • [27] Text-to-image synthesis with self-supervised learning
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    PATTERN RECOGNITION LETTERS, 2022, 157 : 119 - 126
  • [28] Self-Supervised Text Erasing with Controllable Image Synthesis
    Jiang, Gangwei
    Wang, Shiyao
    Ge, Tiezheng
    Jiang, Yuning
    Wei, Ying
    Lian, Defu
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1973 - 1983
  • [29] Self-supervised Pre-training of Text Recognizers
    Kiss, Martin
    Hradis, Michal
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 218 - 235
  • [30] Self-supervised writer adaptation using perceptive concepts: Application to on-line text recognition
    Oudot, L
    Prevost, L
    Moises, A
    Milgram, M
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 598 - 601