Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

被引:0
|
作者
Zu, Xinyan [1 ]
Yu, Haiyang [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For 'visually', we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For 'linguistically', a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For 'semantically', we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
引用
收藏
页码:1858 / 1866
页数:9
相关论文
共 50 条
  • [1] TOWARDS ACCURATE INSTANCE-LEVEL TEXT SPOTTING WITH GUIDED ATTENTION
    Wang, Haiyan
    Rong, Xuejian
    Tian, Yingli
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 994 - 999
  • [2] Exploiting Visual Semantic Reasoning for Video-Text Retrieval
    Feng, Zerun
    Zeng, Zhimin
    Guo, Caili
    Li, Zheng
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1005 - 1011
  • [3] You Only Recognize Once: Towards Fast Video Text Spotting
    Cheng, Zhanzhan
    Lu, Jing
    Niu, Yi
    Pu, Shiliang
    Wu, Fei
    Zhou, Shuigeng
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 855 - 863
  • [4] Textual Visual Semantic Dataset for Text Spotting
    Sabir, Ahmed
    Moreno-Noguer, Francesc
    Padro, Lluis
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2306 - 2315
  • [5] Balanced Synthetic Data for Accurate Scene Text Spotting
    Yao, Ying
    Huang, Zhangjin
    TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2018), 2018, 10806
  • [6] ADVMIX: DATA AUGMENTATION FOR ACCURATE SCENE TEXT SPOTTING
    Huang, Yizhang
    Fang, Kun
    Huang, Xiaolin
    Yang, Jie
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 954 - 958
  • [7] ICDAR 2021 Competition on Scene Video Text Spotting
    Cheng, Zhanzhan
    Lu, Jing
    Zou, Baorui
    Zhou, Shuigeng
    Wu, Fei
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 650 - 662
  • [8] End-to-End Video Text Spotting with Transformer
    Wu, Weijia
    Cai, Yuanqiang
    Shen, Chunhua
    Zhang, Debing
    Fu, Ying
    Zhou, Hong
    Luo, Ping
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
  • [9] Online Reasoning for Semantic Error Detection in Text
    Gutierrez F.
    Dou D.
    de Silva N.
    Fickas S.
    Dou, Dejing (dou@cs.uoregon.edu), 1600, Springer Science and Business Media Deutschland GmbH (06): : 139 - 153
  • [10] A Bilingual, Open World Video Text Dataset and Real-Time Video Text Spotting With Contrastive Learning
    Wu, Weijia
    Li, Zhuang
    Cai, Yuanqiang
    Zhou, Hong
    Zheng Shou, Mike
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 534 - 546