Scene text recognition based on two-stage attention and multi-branch feature fusion module

被引:3
|
作者
Xia, Shifeng [1 ,2 ]
Kou, Jinqiao [3 ]
Liu, Ningzhong [1 ,2 ]
Yin, Tianxiang [1 ,2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Collaborat Innovat Ctr Novel Soft Ware Technol &, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing 211106, Peoples R China
[3] Beijing Inst Comp Technol & Applicat, Fangzhou Key Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text recognition; Transformer; Attention mechanism; Feature fusion; NETWORK; EFFICIENT;
D O I
10.1007/s10489-022-04241-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously improved. The encoder-decoder architecture is currently a general framework in scene text recognition. With 2D attention on the decoder, better attention can be paid to the position of each character. However, many methods based on the encoder-decoder architecture only adopt the attention mechanism on the decoder. Therefore, their ability to locate characters is limited. In order to solve this problem, we propose a Transformer-based encoder-decoder structure with a two-stage attention mechanism for scene text recognition. At the encoder, a first-stage attention module integrating spatial attention and channel attention is used to capture the overall location of the text in the image, while at the decoder, a second-stage attention module is used to pinpoint the position of each character in the text image. This two-stage attention mechanism can locate the position of the text more effectively and improve recognition accuracy. Also, we design a multi-branch feature fusion module for the encoder that can fuse features from different receptive fields to obtain more robust features. We train the model on synthetic text datasets and test it on real scene text datasets. The experimental results show that our model is very competitive.
引用
收藏
页码:14219 / 14232
页数:14
相关论文
共 50 条
  • [21] A Person Re-Identification Method Based on Multi-Branch Feature Fusion
    Wang, Xuefang
    Hu, Xintong
    Liu, Peishun
    Tang, Ruichun
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [22] An algorithm based on multi-branch feature cross fusion for archaeological illustration of murals
    Zeng, Xiaolin
    Cheng, Lei
    Li, Shanna
    Liu, Xueping
    HERITAGE SCIENCE, 2024, 12 (01):
  • [23] A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion
    Li, Nianfeng
    Wang, Zhenyan
    Huang, Yongyuan
    Tian, Jia
    Li, Xinyuan
    Xiao, Zhiguo
    SENSORS, 2024, 24 (12)
  • [24] Research on Re-recognition Method of Multi-branch Fusion Attention Mechanism for Occluded Pedestrian
    Zhao, Haiyan
    Xu, Yan
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 477 - 480
  • [25] SGRN: SEMG-based gesture recognition network with multi-dimensional feature extraction and multi-branch information fusion
    Gan, Zhenhua
    Bai, Yuankun
    Wu, Peishu
    Xiong, Baoping
    Zeng, Nianyin
    Zou, Fumin
    Li, Jinyang
    Guo, Feng
    He, Dongyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 259
  • [26] A Lightweight and Multi-Branch Module in Facial Semantic Segmentation Feature Extraction
    Li, Yuxuan
    Wu, Jiatai
    Chen, Wenxiao
    Tan, Pengcheng
    Ngan, Chok-Tim
    Ou, Binkai
    IEEE ACCESS, 2024, 12 : 84803 - 84814
  • [27] A two-stage fusion remote sensing image dehazing network based on multi-scale feature and hybrid attention
    Miao, Mengjun
    Huang, Heming
    Da, Feipeng
    Song, Dongke
    Fan, Yonghong
    Zhang, Miao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 373 - 383
  • [28] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
    Cao, Jingang
    Yang, Guotian
    Yang, Xiyun
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (12): : 1531 - 1544
  • [29] TAFFNet: Two-Stage Attention-Based Feature Fusion Network for Surface Defect Detection
    Jingang Cao
    Guotian Yang
    Xiyun Yang
    Journal of Signal Processing Systems, 2022, 94 : 1531 - 1544
  • [30] Multi-branch convolutional attention network for multi-sensor feature fusion in intelligent fault diagnosis of rotating machinery
    Wu, Ke
    Li, Zirui
    Chen, Chong
    Song, Zhenguo
    Wu, Jun
    QUALITY ENGINEERING, 2024, 36 (03) : 609 - 623