Scene text recognition based on two-stage attention and multi-branch feature fusion module

被引:3
|
作者
Xia, Shifeng [1 ,2 ]
Kou, Jinqiao [3 ]
Liu, Ningzhong [1 ,2 ]
Yin, Tianxiang [1 ,2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Collaborat Innovat Ctr Novel Soft Ware Technol &, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing 211106, Peoples R China
[3] Beijing Inst Comp Technol & Applicat, Fangzhou Key Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text recognition; Transformer; Attention mechanism; Feature fusion; NETWORK; EFFICIENT;
D O I
10.1007/s10489-022-04241-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously improved. The encoder-decoder architecture is currently a general framework in scene text recognition. With 2D attention on the decoder, better attention can be paid to the position of each character. However, many methods based on the encoder-decoder architecture only adopt the attention mechanism on the decoder. Therefore, their ability to locate characters is limited. In order to solve this problem, we propose a Transformer-based encoder-decoder structure with a two-stage attention mechanism for scene text recognition. At the encoder, a first-stage attention module integrating spatial attention and channel attention is used to capture the overall location of the text in the image, while at the decoder, a second-stage attention module is used to pinpoint the position of each character in the text image. This two-stage attention mechanism can locate the position of the text more effectively and improve recognition accuracy. Also, we design a multi-branch feature fusion module for the encoder that can fuse features from different receptive fields to obtain more robust features. We train the model on synthetic text datasets and test it on real scene text datasets. The experimental results show that our model is very competitive.
引用
收藏
页码:14219 / 14232
页数:14
相关论文
共 50 条
  • [1] Scene text recognition based on two-stage attention and multi-branch feature fusion module
    Shifeng Xia
    Jinqiao Kou
    Ningzhong Liu
    Tianxiang Yin
    Applied Intelligence, 2023, 53 : 14219 - 14232
  • [2] A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification
    Shi, Cuiping
    Zhao, Xin
    Wang, Liguo
    REMOTE SENSING, 2021, 13 (10)
  • [3] Multi-branch guided attention network for irregular text recognition
    Wang, Cong
    Liu, Cheng-Lin
    NEUROCOMPUTING, 2021, 425 : 278 - 289
  • [4] Scene Classification Based on Two-Stage Deep Feature Fusion
    Liu, Yishu
    Liu, Yingbin
    Ding, Liwang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2018, 15 (02) : 183 - 186
  • [5] Remote Sensing Image Scene Classification Based on Deep Multi-branch Feature Fusion Network
    Zhang Tong
    Zheng En-rang
    Shen Jun-ge
    Gao An-tong
    ACTA PHOTONICA SINICA, 2020, 49 (05)
  • [6] Tennis Action Recognition Based on Multi-Branch Mixed Attention
    Zhou, Xianwei
    Chen, Weitao
    Li, Zhenfeng
    Li, Yuan
    Lei, Jiale
    Yu, Songsen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, KSEM 2023, 2023, 14118 : 162 - 175
  • [7] CDBA: a novel multi-branch feature fusion model for EEG-based emotion recognition
    Huang, Zhentao
    Ma, Yahong
    Su, Jianyun
    Shi, Hangyu
    Jia, Shanshan
    Yuan, Baoxi
    Li, Weisu
    Geng, Jingzhi
    Yang, Tingting
    FRONTIERS IN PHYSIOLOGY, 2023, 14
  • [8] Lightweight container code recognition based on multi-reuse feature fusion and multi-branch structure merger
    Yang, Dapeng
    Wang, Guanghui
    Liu, Mingtang
    Yue, Shuang
    Zhang, Hao
    Chen, Xiaokang
    Zhang, Mengxiao
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (06)
  • [9] A deep temporal network for motor imagery classification based on multi-branch feature fusion and attention mechanism
    Zhao, Jinke
    Liu, Mingliang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [10] Lightweight container code recognition based on multi-reuse feature fusion and multi-branch structure merger
    Dapeng Yang
    Guanghui Wang
    Mingtang Liu
    Shuang Yue
    Hao Zhang
    Xiaokang Chen
    Mengxiao Zhang
    Journal of Real-Time Image Processing, 2023, 20