Scene Text Telescope: Text-Focused Scene Image Super-Resolution

被引:75
|
作者
Chen, Jingye [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
关键词
NEURAL-NETWORK; ATTENTION NETWORK; RECOGNITION;
D O I
10.1109/CVPR46437.2021.01185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image super-resolution, which is often regarded as a preprocessing procedure of scene text recognition, aims to recover the realistic features from a low-resolution text image. It has always been challenging due to large variations in text shapes, fonts, backgrounds, etc. However, most existing methods employ generic super-resolution frameworks to handle scene text images while ignoring text-specific properties such as text-level layouts and character-level details. In this paper, we establish a text-focused super-resolution framework, called Scene Text Telescope (STT). In terms of text-level layouts, we propose a Transformer-Based Super-Resolution Network (TBSRN) containing a Self-Attention Module to extract sequential information, which is robust to tackle the texts in arbitrary orientations. In terms of character-level details, we propose a Position-Aware Module and a Content-Aware Module to highlight the position and the content of each character. By observing that some characters look indistinguishable in low-resolution conditions, we use a weighted cross-entropy loss to tackle this problem. We conduct extensive experiments, including text recognition with pre-trained recognizers and image quality evaluation, on TextZoom and several scene text recognition benchmarks to assess the super-resolution images. The experimental results show that our STT can indeed generate text-focused super-resolution images and outperform the existing methods in terms of recognition accuracy.
引用
收藏
页码:12021 / 12030
页数:10
相关论文
共 50 条
  • [1] Scene Text Segmentation with Text-Focused Transformers
    Yu, Haiyang
    Wang, Xiaocong
    Niu, Ke
    Li, Bin
    Xue, Xiangyang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2898 - 2907
  • [2] Text Prior Guided Scene Text Image Super-Resolution
    Ma, Jianqi
    Guo, Shi
    Zhang, Lei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1341 - 1353
  • [3] Text Gestalt: Stroke-Aware Scene Text Image Super-resolution
    Chen, Jingye
    Yu, Haiyang
    Ma, Jianqi
    Li, Bin
    Xue, Xiangyang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 285 - 293
  • [4] Batch-transformer for scene text image super-resolution
    Sun, Yaqi
    Xie, Xiaolan
    Li, Zhi
    Yang, Kai
    VISUAL COMPUTER, 2024, 40 (10): : 7399 - 7409
  • [5] Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer
    Shi, Qin
    Zhu, Yu
    Liu, Yatong
    Ye, Jiongyao
    Yang, Dawei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [6] A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution
    Ma, Jianqi
    Liang, Zhetong
    Zhang, Lei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5901 - 5910
  • [7] Scene Text Image Super-Resolution Via Semantic Distillation and Text Perceptual Loss
    Zhao, Cairong
    Shu, Rui
    Feng, Shuyang
    Zhu, Liang
    Wang, Xuekuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1153 - 1164
  • [8] A Benchmark for Chinese-English Scene Text Image Super-resolution
    Ma, Jianqi
    Liang, Zhetong
    Xiang, Wangmeng
    Yang, Xi
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19395 - 19404
  • [9] GARDEN: Generative Prior Guided Network for Scene Text Image Super-Resolution
    Kong, Yuxin
    Ma, Weihong
    Jin, Lianwen
    Xue, Yang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 196 - 214
  • [10] Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
    Zhao, Cairong
    Feng, Shuyang
    Zhao, Brian Nlong
    Ding, Zhijun
    Wu, Jun
    Shen, Fuming
    Shen, Heng Tao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2908 - 2917