Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引:210
|
作者
Fang, Shancheng [1 ]
Xie, Hongtao [1 ]
Wang, Yuxin [1 ]
Mao, Zhendong [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
关键词
D O I
10.1109/CVPR46437.2021.00702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
引用
收藏
页码:7094 / 7103
页数:10
相关论文
共 42 条
  • [21] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Takashima, Akihiko
    Masumura, Ryo
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2636 - 2640
  • [22] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Takashima, Akihiko
    Masumura, Ryo
    Proceedings - International Conference on Image Processing, ICIP, 2022, : 2636 - 2640
  • [23] Automatic text recognition in natural scene and its translation into user defined language
    Bijalwan, Deepak Chandra
    Aggarwal, Alok
    2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 324 - 329
  • [24] Recognition of visual scene elements from a story text in Persian natural language
    Hashemi-Namin, Mojdeh
    Jahed-Motlagh, Mohammad Reza
    Rahmani, Adel Torkaman
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 693 - 719
  • [25] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Xing, Mengting
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
  • [26] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
    Wang, Yuxin
    Xie, Hongtao
    Fang, Shancheng
    Wang, Jing
    Zhu, Shenggao
    Zhang, Yongdong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14174 - 14183
  • [27] Radar technical language modeling with named entity recognition and text classification
    Zaunegger, Jackson S.
    Singerman, Paul G.
    Narayanan, Ram M.
    O'Rourke, Sean M.
    Rangaswamy, Muralidhar
    RADAR SENSOR TECHNOLOGY XXVI, 2022, 12108
  • [28] Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network
    Wei, Jiajun
    Zhan, Hongjian
    Lu, Yue
    Tu, Xiao
    Yin, Bing
    Liu, Cong
    Pal, Umapada
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5885 - 5893
  • [29] ESTIMATES OF THE OPERATING TIME OF STABLE ITERATIVE, LANGUAGE-MODELING, AND RECOGNITION SYSTEMS
    TSIVLIN, YV
    CYBERNETICS, 1987, 23 (03): : 351 - 361
  • [30] Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
    Gao, Zuan
    Wang, Yuxin
    Qu, Yadong
    Zhang, Boqiang
    Wang, Zixiao
    Xu, Jianjun
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 767 - 775