Scene text recognition based on two-stage attention and multi-branch feature fusion module

被引:3
|
作者
Xia, Shifeng [1 ,2 ]
Kou, Jinqiao [3 ]
Liu, Ningzhong [1 ,2 ]
Yin, Tianxiang [1 ,2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Collaborat Innovat Ctr Novel Soft Ware Technol &, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing 211106, Peoples R China
[3] Beijing Inst Comp Technol & Applicat, Fangzhou Key Lab, Beijing 100854, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene text recognition; Transformer; Attention mechanism; Feature fusion; NETWORK; EFFICIENT;
D O I
10.1007/s10489-022-04241-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously improved. The encoder-decoder architecture is currently a general framework in scene text recognition. With 2D attention on the decoder, better attention can be paid to the position of each character. However, many methods based on the encoder-decoder architecture only adopt the attention mechanism on the decoder. Therefore, their ability to locate characters is limited. In order to solve this problem, we propose a Transformer-based encoder-decoder structure with a two-stage attention mechanism for scene text recognition. At the encoder, a first-stage attention module integrating spatial attention and channel attention is used to capture the overall location of the text in the image, while at the decoder, a second-stage attention module is used to pinpoint the position of each character in the text image. This two-stage attention mechanism can locate the position of the text more effectively and improve recognition accuracy. Also, we design a multi-branch feature fusion module for the encoder that can fuse features from different receptive fields to obtain more robust features. We train the model on synthetic text datasets and test it on real scene text datasets. The experimental results show that our model is very competitive.
引用
收藏
页码:14219 / 14232
页数:14
相关论文
共 50 条
  • [41] An Improved YOLOv5 Underwater Detector Based on an Attention Mechanism and Multi-Branch Reparameterization Module
    Zhang, Jian
    Chen, Hongda
    Yan, Xinyue
    Zhou, Kexin
    Zhang, Jinshuai
    Zhang, Yonghui
    Jiang, Hong
    Shao, Bingqian
    ELECTRONICS, 2023, 12 (12)
  • [42] Research on road crack segmentation based on deep convolution and transformer with multi-branch feature fusion
    Lai, Yuebo
    Liu, Bing
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (11)
  • [43] Mixed-type wafer defect detection based on multi-branch feature enhanced residual module
    Chen, Shouhong
    Huang, Zhentao
    Wang, Tao
    Hou, Xingna
    Ma, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [44] A vehicle re-identification framework based on the improved multi-branch feature fusion network
    Rong, Leilei
    Xu, Yan
    Zhou, Xiaolei
    Han, Lisu
    Li, Linghui
    Pan, Xuguang
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [45] Multi-branch feature learning based speech emotion recognition using SCAR-NET
    Mao, Keji
    Wang, Yuxiang
    Ren, Ligang
    Zhang, Jinhong
    Qiu, Jiefan
    Dai, Guanglin
    CONNECTION SCIENCE, 2023, 35 (01)
  • [46] Multi-Branch Convolutional Neural Network for Automatic Sleep Stage Classification with Embedded Stage Refinement and Residual Attention Channel Fusion
    Zhu, Tianqi
    Luo, Wei
    Yu, Feng
    SENSORS, 2020, 20 (22) : 1 - 15
  • [47] A two-stage damage recognition method based on data fusion
    Wang, Xiaojuan
    Lan, Xiangyong
    Zhou, Hongyuan
    Wang, Lihui
    Zhang, Jian
    Zhendong yu Chongji/Journal of Vibration and Shock, 2024, 43 (17): : 132 - 144
  • [48] Multi-branch fusion graph neural network based on multi-head attention for childhood seizure detection
    Li, Yang
    Yang, Yang
    Song, Shangling
    Wang, Hongjun
    Sun, Mengzhou
    Liang, Xiaoyun
    Zhao, Penghui
    Wang, Baiyang
    Wang, Na
    Sun, Qiyue
    Han, Zijuan
    FRONTIERS IN PHYSIOLOGY, 2024, 15
  • [49] A Two-Stage Special Feature Deep Fusion Network with Spatial Attention for Hippocampus Segmentation
    Cai, Zhengwei
    Wang, Shaoyu
    Chen, Qiang
    Lin, Runlong
    Hu, Yun
    Zhu, Yian
    2021 IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2021), 2021, : 103 - 106
  • [50] A Multi-Branch Feature Fusion Network for Building Detection in Remote Sensing Images
    Li, Chao
    Huang, Xinyu
    Tang, Jiechen
    Wang, Kai
    IEEE ACCESS, 2021, 9 (09): : 168511 - 168519