Multi-granularity Prediction for Scene Text Recognition

被引:40
|
作者
Wang, Peng [1 ]
Da, Cheng [1 ]
Yao, Cong [1 ]
机构
[1] Alibaba DAMO Acad, Beijing, Peoples R China
来源
关键词
Scene text recognition; ViT; Multi-granularity prediction; EFFICIENT;
D O I
10.1007/978-3-031-19815-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e., subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93.35% on standard benchmarks.
引用
收藏
页码:339 / 355
页数:17
相关论文
共 50 条
  • [1] Multi-granularity Deep Local Representations for Irregular Scene Text Recognition
    Gao, Hongchao
    Li, Yujia
    Dai, Jiao
    Wang, Xi
    Han, Jizhong
    Li, Ruixuan
    ACM/IMS Transactions on Data Science, 2021, 2 (02):
  • [2] MAGIC: Multi-granularity domain adaptation for text recognition
    Zhang, Jia-Ying
    Liu, Xiao-Qian
    Xue, Zhi-Yuan
    Luo, Xin
    Xu, Xin-Shun
    PATTERN RECOGNITION, 2025, 161
  • [3] Text-enhanced Multi-Granularity Temporal Graph Learning for Event Prediction
    Han, Xiaoxue
    Ning, Yue
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 171 - 180
  • [4] Progressive Multi-granularity Analysis for Video Prediction
    Jingwei Xu
    Bingbing Ni
    Xiaokang Yang
    International Journal of Computer Vision, 2021, 129 : 601 - 618
  • [5] Progressive Multi-granularity Analysis for Video Prediction
    Xu, Jingwei
    Ni, Bingbing
    Yang, Xiaokang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (03) : 601 - 618
  • [6] A Multi-Granularity Heterogeneous Graph for Extractive Text Summarization
    Zhao, Henghui
    Zhang, Wensheng
    Huang, Mengxing
    Feng, Siling
    Wu, Yuanyuan
    ELECTRONICS, 2023, 12 (10)
  • [7] On persuasion in spam email: A multi-granularity text analysis
    Janez-Martino, Francisco
    Barron-Cedeno, Alberto
    Alaiz-Rodriguez, Rocio
    Gonzalez-Castro, Victor
    Muti, Arianna
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [8] A Multi-Granularity Semantic Extraction Method for Text Classification
    Li, Min
    Liu, Zeyu
    Li, Gang
    Han, Delong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 224 - 236
  • [9] Research on Text Classification by Fusing Multi-Granularity Information
    Xin, Miaomiao
    Ma, Li
    Hu, Bofa
    Computer Engineering and Applications, 2023, 59 (09) : 104 - 111
  • [10] Text Sentiment Analysis Based on Multi-Granularity Joint Solution
    Fang, Xianghui
    Wang, Guoyin
    Liu, Qun
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 315 - 321