Scene text recognition based on two-stage attention and multi-branch feature fusion module

被引:3
|
作者
Xia, Shifeng [1 ,2 ]
Kou, Jinqiao [3 ]
Liu, Ningzhong [1 ,2 ]
Yin, Tianxiang [1 ,2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Collaborat Innovat Ctr Novel Soft Ware Technol &, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing 211106, Peoples R China
[3] Beijing Inst Comp Technol & Applicat, Fangzhou Key Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text recognition; Transformer; Attention mechanism; Feature fusion; NETWORK; EFFICIENT;
D O I
10.1007/s10489-022-04241-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously improved. The encoder-decoder architecture is currently a general framework in scene text recognition. With 2D attention on the decoder, better attention can be paid to the position of each character. However, many methods based on the encoder-decoder architecture only adopt the attention mechanism on the decoder. Therefore, their ability to locate characters is limited. In order to solve this problem, we propose a Transformer-based encoder-decoder structure with a two-stage attention mechanism for scene text recognition. At the encoder, a first-stage attention module integrating spatial attention and channel attention is used to capture the overall location of the text in the image, while at the decoder, a second-stage attention module is used to pinpoint the position of each character in the text image. This two-stage attention mechanism can locate the position of the text more effectively and improve recognition accuracy. Also, we design a multi-branch feature fusion module for the encoder that can fuse features from different receptive fields to obtain more robust features. We train the model on synthetic text datasets and test it on real scene text datasets. The experimental results show that our model is very competitive.
引用
收藏
页码:14219 / 14232
页数:14
相关论文
共 50 条
  • [31] Remote Sensing Scene Classification via Multi-Branch Local Attention Network
    Chen, Si-Bao
    Wei, Qing-Song
    Wang, Wen-Zhong
    Tang, Jin
    Luo, Bin
    Wang, Zu-Yuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 99 - 109
  • [32] Multi-branch feature fusion and refinement network for salient object detection
    Yang, Jinyu
    Shi, Yanjiao
    Zhang, Jin
    Guo, Qianqian
    Zhang, Qing
    Cui, Liu
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [33] Multi-Branch Attention Fusion Network for Cloud and Cloud Shadow Segmentation
    Gu, Hongde
    Gu, Guowei
    Liu, Yi
    Lin, Haifeng
    Xu, Yao
    REMOTE SENSING, 2024, 16 (13)
  • [34] Scene Recognition Based on Multi-feature Fusion for Indoor Robot
    Liu, Xiaocheng
    Hong, Wei
    Lu, Huiqiu
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 160 - 169
  • [35] Multi-Branch CNN GRU with attention mechanism for human action recognition
    Verma, Updesh
    Tyagi, Pratibha
    Aneja, Manpreet Kaur
    ENGINEERING RESEARCH EXPRESS, 2023, 5 (02):
  • [36] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [37] Enhanced Attention Tracking With Multi-Branch Network for Egocentric Activity Recognition
    Liu, Tianshan
    Lam, Kin-Man
    Zhao, Rui
    Kong, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3587 - 3602
  • [38] Coal-gangue sound recognition using hybrid multi-branch CNN based on attention mechanism fusion in noisy environments
    Song, Qingjun
    Hao, Wenchao
    Song, Qinghui
    Jiang, Haiyan
    Li, Kai
    Sun, Shirong
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [39] Substation Abnormal Scene Recognition Based on Two-Stage Contrastive Learning
    Liu, Shanfeng
    Su, Haitao
    Mao, Wandeng
    Li, Miaomiao
    Zhang, Jun
    Bao, Hua
    ENERGIES, 2024, 17 (24)
  • [40] A vehicle re-identification framework based on the improved multi-branch feature fusion network
    Leilei Rong
    Yan Xu
    Xiaolei Zhou
    Lisu Han
    Linghui Li
    Xuguang Pan
    Scientific Reports, 11