Scene Text Recognition with Transformer using Multi-patches

被引:0
|
作者
Wang Y. [1 ]
Ha J.-E. [2 ]
机构
[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology
[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology
关键词
Deep learning; Scene text recognition; Transformer;
D O I
10.5302/J.ICROS.2022.22.0107
中图分类号
学科分类号
摘要
In this paper, we explore the application of Vision transformer (ViT) to the scene text recognition task. As a popular research direction in computer vision, Scene text recognition enables computers to recognize or read the text in natural scenes, such as object labels, text descriptions, and road text signs. At present, the traditional convolutional neural network-based model has better performance. Still, in the face of complex backgrounds and irregular scene text pictures, the performance of the convolutional neural network-based model is challenging to improve in curved text, diverse fonts, distortions, etc. With the application of transformers in computer vision, the model structure based on transformers has also significantly been developed. Although the current transformer-based model can obtain the performance of the model structure similar to CNN, it is currently in the early stage of application, and there is much room for research and improvement. We propose a multi-scale vertical rectangular patch model (MSVSTR) for transformer-based feature extractor to be more suitable for text images. By only arranging the patches in a single direction, when the image is cropped through the patch, it can be more suitable for the distribution form of the text in the text image. At the same time, to be suitable for different numbers of characters in other texts and more robust feature extraction, vertical rectangular patches of different scales are applied to crop the image. Our structure performs better through various ablation experiments than similar transformer-based STR models. At the same time, experiments show that our structure can perform seven benchmarks well. © ICROS 2022.
引用
收藏
页码:862 / 867
页数:5
相关论文
共 50 条
  • [1] Iris Recognition by Learning Fragile Bits on Multi-patches using Monogenic Riesz Signals
    Shekar, B. H.
    Bhat, Sharada S.
    Mestetsky, Leonid
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT II, 2019, 11942 : 462 - 471
  • [2] Apvit: ViT with adaptive patches for scene text recognition
    Ning Zhang
    Ce Li
    Zongshun Wang
    Jialin Ma
    Zhiqiang Feng
    Discover Applied Sciences, 7 (4)
  • [3] Lightweight Scene Text Recognition Based on Transformer
    Luan, Xin
    Zhang, Jinwei
    Xu, Miaomiao
    Silamu, Wushouer
    Li, Yanbing
    SENSORS, 2023, 23 (09)
  • [4] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
    Wu, Xing
    Tang, Bin
    Zhao, Ming
    Wang, Jianjia
    Guo, Yike
    APPLIED INTELLIGENCE, 2023, 53 (03) : 3444 - 3458
  • [5] STR Transformer: A Cross-domain Transformer for Scene Text Recognition
    Xing Wu
    Bin Tang
    Ming Zhao
    Jianjia Wang
    Yike Guo
    Applied Intelligence, 2023, 53 : 3444 - 3458
  • [6] Pure Transformer with Integrated Experts for Scene Text Recognition
    Tan, Yew Lee
    Kong, Adams Wai-Kin
    Kim, Jung-Jae
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 481 - 497
  • [7] A Transformer-Based Framework for Scene Text Recognition
    Selvam, Prabu
    Koilraj, Joseph Abraham Sundar
    Tavera Romero, Carlos Andres
    Alharbi, Meshal
    Mehbodniya, Abolfazl
    Webber, Julian L.
    Sengan, Sudhakar
    IEEE ACCESS, 2022, 10 : 100895 - 100910
  • [8] Vision Transformer for Fast and Efficient Scene Text Recognition
    Atienza, Rowel
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 319 - 334
  • [9] Outline Generation Transformer for Bilingual Scene Text Recognition
    Ho, Jui-Teng
    Hsu, Gee-Sern
    Yanushkevich, Svetlana
    Gavrilova, Marina L.
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [10] Display-Semantic Transformer for Scene Text Recognition
    Yang, Xinqi
    Silamu, Wushour
    Xu, Miaomiao
    Li, Yanbing
    SENSORS, 2023, 23 (19)