Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

被引:0
|
作者
Zu, Xinyan [1 ]
Yu, Haiyang [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For 'visually', we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For 'linguistically', a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For 'semantically', we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
引用
收藏
页码:1858 / 1866
页数:9
相关论文
共 50 条
  • [41] Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals
    Jin, Lu
    Li, Zechao
    Tang, Jinhui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 1838 - 1851
  • [42] Multi-modal graph reasoning for structured video text extraction
    Shi, Weitao
    Wang, Han
    Lou, Xin
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 107
  • [43] Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation
    Liu, Yue
    Shi, Ying
    Lin, Chaojun
    Hua, Jie
    Huang, Ziqi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 630 - 642
  • [44] Towards patent text analysis based on semantic role labelling
    He Y.
    Li Y.
    Meng L.
    Xu H.
    He, Yanqing (heyq@istic.ac.cn), 2017, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (15) : 256 - 266
  • [45] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [46] Piece-wise linearity based method for text frame classification in video
    Sharma, Nabin
    Shivakumara, Palaiahnakote
    Pal, Umapada
    Blumenstein, Michael
    Tan, Chew Lim
    PATTERN RECOGNITION, 2015, 48 (03) : 862 - 881
  • [47] Level-wise aligned dual networks for text-video retrieval
    Lin, Qiubin
    Cao, Wenming
    He, Zhiquan
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2022, 2022 (01)
  • [48] Image-Text Embedding Learning via Visual and Textual Semantic Reasoning
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 641 - 656
  • [49] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
    Bhunia, Ayan Kumar
    Sain, Aneeshan
    Kumar, Amandeep
    Ghose, Shuvozit
    Chowdhury, Pinaki Nath
    Song, Yi-Zhe
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14920 - 14929
  • [50] Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
    Kittenplon, Yair
    Lavi, Inbal
    Fogel, Sharon
    Bar, Yarin
    Manmatha, R.
    Perona, Pietro
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4594 - 4603