Multi-granularity Prediction for Scene Text Recognition

被引:40
|
作者
Wang, Peng [1 ]
Da, Cheng [1 ]
Yao, Cong [1 ]
机构
[1] Alibaba DAMO Acad, Beijing, Peoples R China
来源
关键词
Scene text recognition; ViT; Multi-granularity prediction; EFFICIENT;
D O I
10.1007/978-3-031-19815-1_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e., subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93.35% on standard benchmarks.
引用
收藏
页码:339 / 355
页数:17
相关论文
共 50 条
  • [21] MULTI-GRANULARITY REASONING FOR SOCIAL RELATION RECOGNITION FROM IMAGES
    Zhang, Meng
    Liu, Xinchen
    Liu, Wu
    Zhou, Anfu
    Ma, Huadong
    Mei, Tao
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1618 - 1623
  • [22] MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition
    Cao, Meiqi
    Yan, Rui
    Shu, Xiangbo
    Zhang, Jiachao
    Wang, Jinpeng
    Xie, Guo-Sen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7666 - 7675
  • [23] Linguistic Steganalysis via Fusing Multi-Granularity Attentional Text Features
    WEN Juan
    DENG Yaqian
    PENG Wanli
    XUE Yiming
    Chinese Journal of Electronics, 2023, 32 (01) : 76 - 84
  • [24] Linguistic Steganalysis via Fusing Multi-Granularity Attentional Text Features
    Wen, Juan
    Deng, Yaqian
    Peng, Wanli
    Xue, Yiming
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (01) : 76 - 84
  • [25] GAITMM: MULTI-GRANULARITY MOTION SEQUENCE LEARNING FOR GAIT RECOGNITION
    Wang, Lei
    Liu, Bo
    Wang, Bincheng
    Yu, Fuqiang
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 845 - 849
  • [26] Multi-Granularity Neural Sentence Model for Measuring Short Text Similarity
    Huang, Jiangping
    Yao, Shuxin
    Lyu, Chen
    Ji, Donghong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT I, 2017, 10177 : 439 - 455
  • [27] A multi-granularity knowledge association model of geological text based on hypernetwork
    Can Zhuang
    Wenjia Li
    Zhong Xie
    Liang Wu
    Earth Science Informatics, 2021, 14 : 227 - 246
  • [28] Multi-Granularity Chinese Text Sentiment Analysis Driven by Knowledge and Data
    Liu, Zhongbao
    Wang, Yufei
    Computer Engineering and Applications, 2023, 59 (15) : 177 - 186
  • [29] A multi-granularity knowledge association model of geological text based on hypernetwork
    Zhuang, Can
    Li, Wenjia
    Xie, Zhong
    Wu, Liang
    EARTH SCIENCE INFORMATICS, 2021, 14 (01) : 227 - 246
  • [30] Short Text Hashing Improved by Integrating Multi-granularity Topics and Tags
    Xu, Jiaming
    Xu, Bo
    Tian, Guanhua
    Zhao, Jun
    Wang, Fangyuan
    Hao, Hongwei
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 444 - 455