Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引:210
|
作者
Fang, Shancheng [1 ]
Xie, Hongtao [1 ]
Wang, Yuxin [1 ]
Mao, Zhendong [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.00702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
引用
收藏
页码:7094 / 7103
页数:10
相关论文
共 42 条
  • [31] Scene text recognition via context modeling for low-quality image in logistics industry
    Heng, Herui
    Li, Peiji
    Guan, Tuxin
    Yang, Tianyu
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) : 3229 - 3248
  • [32] Scene text recognition via context modeling for low-quality image in logistics industry
    Herui Heng
    Peiji Li
    Tuxin Guan
    Tianyu Yang
    Complex & Intelligent Systems, 2023, 9 : 3229 - 3248
  • [33] Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Silamu, Wushour
    Li, Yanbing
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [34] Optimization integrated generative adversarial network for occluded text recognition with language modeling
    Selvaraj, Selvin Ebenezer
    Tripuraribhatla, Raghuveera
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (08):
  • [35] Scene text recognition via dual character counting-aware visual and semantic modeling network
    Ke XIAO
    Anna ZHU
    Brian Kenji IWANA
    Cheng-Lin LIU
    ScienceChina(InformationSciences), 2024, 67 (03) : 313 - 314
  • [36] Scene text recognition via dual character counting-aware visual and semantic modeling network
    Xiao, Ke
    Zhu, Anna
    Iwana, Brian Kenji
    Liu, Cheng-Lin
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (03)
  • [37] Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
    Riadh Harizi
    Rim Walha
    Fadoua Drira
    Mourad Zaied
    Multimedia Tools and Applications, 2022, 81 : 3091 - 3106
  • [38] Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
    Harizi, Riadh
    Walha, Rim
    Drira, Fadoua
    Zaied, Mourad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3091 - 3106
  • [39] Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text
    Horii, Koharu
    Ohta, Kengo
    Nishimura, Ryota
    Ogawa, Atsunori
    Kitaoka, Norihide
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1851 - 1856
  • [40] Research-based-named Entity Recognition Learning Text Biomedical Extraction by Adoption of Training Bidirectional Language Model (BiLM)
    Abed, Alshreef
    Jingling, Yuan
    Li, Lin
    Journal of Computers (Taiwan), 2020, 31 (04) : 157 - 173