Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引：210

作者：

Fang, Shancheng ^{[1
]}

Xie, Hongtao ^{[1
]}

Wang, Yuxin ^{[1
]}

Mao, Zhendong ^{[1
]}

Zhang, Yongdong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00702

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.

引用

页码：7094 / 7103

页数：10

共 42 条

[21] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
Orihashi, Shota
Yamazaki, Yoshihiro
Uchida, Mihiro
Takashima, Akihiko
Masumura, Ryo
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2636 - 2640
[22] FULLY SHAREABLE SCENE TEXT RECOGNITION MODELING FOR HORIZONTAL AND VERTICAL WRITING
Orihashi, Shota
Yamazaki, Yoshihiro
Uchida, Mihiro
Takashima, Akihiko
Masumura, Ryo
Proceedings - International Conference on Image Processing, ICIP, 2022, : 2636 - 2640
[23] Automatic text recognition in natural scene and its translation into user defined language
Bijalwan, Deepak Chandra
Aggarwal, Alok
2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 324 - 329
[24] Recognition of visual scene elements from a story text in Persian natural language
Hashemi-Namin, Mojdeh
Jahed-Motlagh, Mohammad Reza
Rahmani, Adel Torkaman
NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 693 - 719
[25] PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition
Wang, Yuxin
Xie, Hongtao
Fang, Shancheng
Xing, Mengting
Wang, Jing
Zhu, Shenggao
Zhang, Yongdong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5585 - 5598
[26] From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
Wang, Yuxin
Xie, Hongtao
Fang, Shancheng
Wang, Jing
Zhu, Shenggao
Zhang, Yongdong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14174 - 14183
[27] Radar technical language modeling with named entity recognition and text classification
Zaunegger, Jackson S.
Singerman, Paul G.
Narayanan, Ram M.
O'Rourke, Sean M.
Rangaswamy, Muralidhar
RADAR SENSOR TECHNOLOGY XXVI, 2022, 12108
[28] Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network
Wei, Jiajun
Zhan, Hongjian
Lu, Yue
Tu, Xiao
Yin, Bing
Liu, Cong
Pal, Umapada
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5885 - 5893
[29] ESTIMATES OF THE OPERATING TIME OF STABLE ITERATIVE, LANGUAGE-MODELING, AND RECOGNITION SYSTEMS
TSIVLIN, YV
CYBERNETICS, 1987, 23 (03): : 351 - 361
[30] Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Gao, Zuan
Wang, Yuxin
Qu, Yadong
Zhang, Boqiang
Wang, Zixiao
Xu, Jianjun
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 767 - 775

← 1 2 3 4 5 →