TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

被引:47
|
作者
Singh, Amanpreet [1 ]
Peng, Guan [1 ]
Toh, Mandy [1 ]
Huang, Jing [1 ]
Galuba, Wojciech [1 ]
Hassner, Tal [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
关键词
D O I
10.1109/CVPR46437.2021.00869
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The current systems are crippled by the unavailability of ground truth text annotations for these datasets as well as lack of scene text detection and recognition datasets on real images disallowing the progress in the field of OCR and evaluation of scene text based reasoning in isolation from OCR systems. In this work, we propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR and that training on TextOCR helps achieve state-of-the-art performance on multiple other OCR datasets as well. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion, allowing us to revisit several design choices to achieve new state-of-the-art performance on TextVQA dataset.
引用
收藏
页码:8798 / 8808
页数:11
相关论文
共 50 条
  • [41] Arbitrary-shaped scene text detection with keypoint-based shape representation
    Shuxin Qin
    Lin Chen
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 115 - 127
  • [42] ESRNet: an exploring sample relationships network for arbitrary-shaped scene text detection
    Fan, Huageng
    Lu, Tongwei
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11995 - 12008
  • [43] Arbitrary-shaped scene text detection with keypoint-based shape representation
    Qin, Shuxin
    Chen, Lin
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (02) : 115 - 127
  • [44] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
    Hao, Jiedong
    Wen, Yafei
    Deng, Jie
    Gan, Jun
    Ren, Shuai
    Tan, Hui
    Chen, Xiaoxin
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
  • [45] Person Re-identification with End-to-End Scene Text Recognition
    Kamlesh
    Xu, Pei
    Yang, Yang
    Xu, Yongchao
    COMPUTER VISION, PT III, 2017, 773 : 363 - 374
  • [46] End-to-End Analysis for Text Detection and Recognition in Natural Scene Images
    Alnefaie, Ahlam
    Gupta, Deepak
    Bhuyan, Monowar H.
    Razzak, Imran
    Gupta, Prashant
    Prasad, Mukesh
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2024, 149
  • [48] Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework
    Geng, Tianyu
    IEEE ACCESS, 2024, 12 : 40582 - 40596
  • [49] A large-scale, passive analysis of end-to-end TCP performance over GPRS
    Benko, P
    Malicsko, G
    Veres, A
    IEEE INFOCOM 2004: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-4, PROCEEDINGS, 2004, : 1882 - 1892
  • [50] Vigil: Effective End-to-end Monitoring for Large-scale Recommender Systems at Glance
    Saxena, Priyansh
    Manisha, R.
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5249 - 5250