Multi-granularity Prediction for Scene Text Recognition

被引:40
|
作者
Wang, Peng [1 ]
Da, Cheng [1 ]
Yao, Cong [1 ]
机构
[1] Alibaba DAMO Acad, Beijing, Peoples R China
来源
关键词
Scene text recognition; ViT; Multi-granularity prediction; EFFICIENT;
D O I
10.1007/978-3-031-19815-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e., subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93.35% on standard benchmarks.
引用
收藏
页码:339 / 355
页数:17
相关论文
共 50 条
  • [31] Multi-Granularity Matching Transformer for Text-Based Person Search
    Bao, Liping
    Wei, Longhui
    Zhou, Wengang
    Liu, Lin
    Xie, Lingxi
    Li, Houqiang
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4281 - 4293
  • [32] Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS
    Chi, Wenjiang
    Feng, Xiaoqin
    Xue, Liumeng
    Chen, Yunlin
    Xie, Lei
    Li, Zhifei
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2409 - 2415
  • [33] Towards Better Representations for Multi-Label Text Classification with Multi-granularity Information
    Li, Fangfang
    Su, Puzhen
    Duan, Junwen
    Xiao, Weidong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9470 - 9480
  • [34] Feature fusion of multi-granularity and multi-scale for facial expression recognition
    Xia, Haiying
    Lu, Lidan
    Song, Shuxiang
    VISUAL COMPUTER, 2024, 40 (03): : 2035 - 2047
  • [35] Feature fusion of multi-granularity and multi-scale for facial expression recognition
    Haiying Xia
    Lidan Lu
    Shuxiang Song
    The Visual Computer, 2024, 40 : 2035 - 2047
  • [36] Multi-granularity Fatigue in Recommendation
    Xie, Ruobing
    Ling, Cheng
    Zhang, Shaoliang
    Xia, Feng
    Lin, Leyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4595 - 4599
  • [37] Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancement
    Zhao, Zhuoran
    Zhou, Qing
    Wu, Chengkai
    Su, Renbin
    Xiong, Weihong
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2024, 132
  • [38] Multi-granularity Attribute Reduction
    Liang, Shaochen
    Liu, Keyu
    Chen, Xiangjian
    Wang, Pingxin
    Yang, Xibei
    ROUGH SETS, IJCRS 2018, 2018, 11103 : 61 - 72
  • [39] Multi-granularity for knowledge distillation
    Shao, Baitan
    Chen, Ying
    IMAGE AND VISION COMPUTING, 2021, 115 (115)
  • [40] Multi-granularity resource Reservations
    Saewong, S
    Rajkumar, R
    RTSS 2005: 26th IEEE International Real-Time Systems Symposium, Proceedings, 2005, : 143 - 153