Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

被引：210

作者：

Fang, Shancheng ^{[1
]}

Xie, Hongtao ^{[1
]}

Wang, Yuxin ^{[1
]}

Mao, Zhendong ^{[1
]}

Zhang, Yongdong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00702

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional doze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.

引用

页码：7094 / 7103

页数：10

共 42 条

[31] Scene text recognition via context modeling for low-quality image in logistics industry
Heng, Herui
Li, Peiji
Guan, Tuxin
Yang, Tianyu
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (03) : 3229 - 3248
[32] Scene text recognition via context modeling for low-quality image in logistics industry
Herui Heng
Peiji Li
Tuxin Guan
Tianyu Yang
Complex & Intelligent Systems, 2023, 9 : 3229 - 3248
[33] Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study
Xu, Miaomiao
Zhang, Jiang
Xu, Lianghui
Silamu, Wushour
Li, Yanbing
APPLIED SCIENCES-BASEL, 2024, 14 (05):
[34] Optimization integrated generative adversarial network for occluded text recognition with language modeling
Selvaraj, Selvin Ebenezer
Tripuraribhatla, Raghuveera
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (08):
[35] Scene text recognition via dual character counting-aware visual and semantic modeling network
Ke XIAO
Anna ZHU
Brian Kenji IWANA
Cheng-Lin LIU
ScienceChina(InformationSciences), 2024, 67 (03) : 313 - 314
[36] Scene text recognition via dual character counting-aware visual and semantic modeling network
Xiao, Ke
Zhu, Anna
Iwana, Brian Kenji
Liu, Cheng-Lin
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (03)
[37] Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
Riadh Harizi
Rim Walha
Fadoua Drira
Mourad Zaied
Multimedia Tools and Applications, 2022, 81 : 3091 - 3106
[38] Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition
Harizi, Riadh
Walha, Rim
Drira, Fadoua
Zaied, Mourad
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3091 - 3106
[39] Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text
Horii, Koharu
Ohta, Kengo
Nishimura, Ryota
Ogawa, Atsunori
Kitaoka, Norihide
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1851 - 1856
[40] Research-based-named Entity Recognition Learning Text Biomedical Extraction by Adoption of Training Bidirectional Language Model (BiLM)
Abed, Alshreef
Jingling, Yuan
Li, Lin
Journal of Computers (Taiwan), 2020, 31 (04) : 157 - 173

← 1 2 3 4 5 →