Scene Text Telescope: Text-Focused Scene Image Super-Resolution

被引:75
|
作者
Chen, Jingye [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
关键词
NEURAL-NETWORK; ATTENTION NETWORK; RECOGNITION;
D O I
10.1109/CVPR46437.2021.01185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image super-resolution, which is often regarded as a preprocessing procedure of scene text recognition, aims to recover the realistic features from a low-resolution text image. It has always been challenging due to large variations in text shapes, fonts, backgrounds, etc. However, most existing methods employ generic super-resolution frameworks to handle scene text images while ignoring text-specific properties such as text-level layouts and character-level details. In this paper, we establish a text-focused super-resolution framework, called Scene Text Telescope (STT). In terms of text-level layouts, we propose a Transformer-Based Super-Resolution Network (TBSRN) containing a Self-Attention Module to extract sequential information, which is robust to tackle the texts in arbitrary orientations. In terms of character-level details, we propose a Position-Aware Module and a Content-Aware Module to highlight the position and the content of each character. By observing that some characters look indistinguishable in low-resolution conditions, we use a weighted cross-entropy loss to tackle this problem. We conduct extensive experiments, including text recognition with pre-trained recognizers and image quality evaluation, on TextZoom and several scene text recognition benchmarks to assess the super-resolution images. The experimental results show that our STT can indeed generate text-focused super-resolution images and outperform the existing methods in terms of recognition accuracy.
引用
收藏
页码:12021 / 12030
页数:10
相关论文
共 50 条
  • [31] Navigating Style Variations in Scene Text Image Super-Resolution through Multi-Scale Perception
    Xu, Feifei
    Yu, Ziheng
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 229 - 238
  • [32] TextDiff: Enhancing scene text image super-resolution with mask-guided residual diffusion models
    Liu, Baolin
    Yang, Zongyuan
    Chiu, Chinwai
    Xiong, Yongping
    PATTERN RECOGNITION, 2025, 164
  • [33] Soft-edge-guided significant coordinate attention network for scene text image super-resolution
    Xi, Chenchen
    Zhang, Kaibing
    He, Xin
    Hu, Yanting
    Chen, Jinguang
    VISUAL COMPUTER, 2024, 40 (08): : 5393 - 5406
  • [34] Text Image Super-resolution by Image Matting and Text Label Supervision
    Lin, Kai
    Liu, Yubao
    Li, Thomas H.
    Liu, Shan
    Li, Ge
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1722 - 1727
  • [35] Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution
    Zhang, Wenyu
    Deng, Xin
    Jia, Baojun
    Yu, Xingtong
    Chen, Yifan
    Ma, Jin
    Ding, Qing
    Zhang, Xinming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2168 - 2179
  • [36] Scene text image super-resolution using multi-scale convolutional neural network with skip connections
    Walha, Rim
    Aouini, Amal
    APPLIED INTELLIGENCE, 2024, : 5931 - 5943
  • [37] Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution
    Zhang, Wenyu
    Deng, Xin
    Jia, Baojun
    Yu, Xingtong
    Chen, Yifan
    Ma, Jin
    Ding, Qing
    Zhang, Xinming
    MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, 2023, : 2168 - 2179
  • [38] Super-resolution enhancement of text image sequences
    Capel, D
    Zisserman, A
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 600 - 605
  • [39] Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
    Singh, Shrey
    Keserwani, Prateek
    Roy, Partha Pratim
    Saini, Rajkumar
    IEEE ACCESS, 2024, 12 : 187640 - 187651
  • [40] Text Image Super-Resolution Guided by Text Structure and Embedding Priors
    Huang, Cong
    Peng, Xiulian
    Liu, Dong
    Lu, Yan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)