Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

被引:0
|
作者
Hu, Yijie [1 ]
Dong, Bin [2 ]
Huang, Kaizhu [3 ]
Ding, Lei [2 ]
Wang, Wei [1 ]
Huang, Xiaowei [4 ]
Wang, Qiu-Feng [1 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Renai Rd, Suzhou 215000, Jiangsu, Peoples R China
[2] Ricoh Software Res Ctr Beijing Co Ltd, Xizhimenwai St, Beijing 100080, Peoples R China
[3] Duke Kunshan Univ, Data Sci Res Ctr, Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
[4] Univ Liverpool, Dept Comp Sci, Lime St, Liverpool L69 3BX, Merseyside, England
基金
中国国家自然科学基金; 英国工程与自然科学研究理事会;
关键词
OCR; scene text recognition; deformable attention; attention alignment; dual path network;
D O I
10.1145/3633517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene text recognition (STR), one typical sequence-to-sequence problem, has drawn much attention recently in multimedia applications. To guarantee good performance, it is essential for STR to obtain aligned character-wise features from the whole-image feature maps. While most present works adopt fully data-driven attention-based alignment, such practice ignores specific character geometric information. In this article, built upon a group of learnable geometric points, we propose a novel shape-driven attention alignment method that is able to obtain character-wise features. Concretely, we first design a corner detector to generate a shape map to guide the attention alignments explicitly, where a series of points can be learned to represent character-wise features flexibly. We then propose a dual-path network with a mutual learning and cooperating strategy that successfully combines CNN with a ViT-based model, leading to further accuracy improvement. We conduct extensive experiments to evaluate the proposed method on various scene text benchmarks, including six popular regular and irregular datasets, two more challenging datasets (i.e., WordArt and OST), and three Chinese datasets. Experimental results indicate that our method can achieve superior performance with a comparable model size against many state-of-the-art models.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] AN EFFICIENT DUAL-PATH ATTENTION SOLAR CELL DEFECT DETECTION NETWORK
    Zhou Y.
    Wang R.
    Yuan Z.
    Liu K.
    Chen H.
    Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2023, 44 (04): : 407 - 413
  • [22] Look back again: Dual parallel attention network for accurate and robust scene text recognition
    Fu, Zilong
    Xie, Hongtao
    Jin, Guoqing
    Guo, Junbo
    ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval, 2021, : 638 - 644
  • [23] A dual-path residual attention fusion network for infrared and visible images
    Zhishe Wang
    Yang F.
    Wang J.
    Xu J.
    Yang F.
    Ji L.
    Optik, 2023, 290
  • [24] Single Image Dehazing via Dual-Path Recurrent Network
    Zhang, Xiaoqin
    Jiang, Runhua
    Wang, Tao
    Luo, Wenhan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5211 - 5222
  • [25] A holistic representation guided attention network for scene text recognition
    Yang, Lu
    Wang, Peng
    Li, Hui
    Li, Zhen
    Zhang, Yanning
    NEUROCOMPUTING, 2020, 414 : 67 - 75
  • [26] Deep neural network with attention model for scene text recognition
    Li, Shuohao
    Tang, Min
    Guo, Qiang
    Lei, Jun
    Zhang, Jun
    IET COMPUTER VISION, 2017, 11 (07) : 605 - 612
  • [27] Deformable Mixed Domain Attention Network for Scene Text Recognition
    Huang, Yangyang
    Fang, Wei
    PROCEEDINGS OF 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2020), 2020, : 142 - 145
  • [28] EPAN: Effective parts attention network for scene text recognition
    Huang, Yunlong
    Sun, Zenghui
    Jin, Lianwen
    Luo, Canjie
    NEUROCOMPUTING, 2020, 376 (376) : 202 - 213
  • [29] AIDAN: An Attention-Guided Dual-Path Network for Pediatric Echocardiography Segmentation
    Hu, Yujin
    Xia, Bei
    Mao, Muyi
    Jin, Zelong
    Du, Jie
    Guo, Libao
    Frangi, Alejandro F.
    Lei, Baiying
    Wang, Tianfu
    IEEE ACCESS, 2020, 8 : 29176 - 29187
  • [30] Depth Privileged Scene Recognition via Dual Attention Hallucination
    Chen, Junjie
    Niu, Li
    Zhang, Liqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9164 - 9178