Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

被引:0
|
作者
Zu, Xinyan [1 ]
Yu, Haiyang [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For 'visually', we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For 'linguistically', a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For 'semantically', we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
引用
收藏
页码:1858 / 1866
页数:9
相关论文
共 50 条
  • [21] Towards End-to-End Text Spotting in Natural Scenes
    Wang, Peng
    Li, Hui
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7266 - 7281
  • [22] Towards Query by Text Example for Pattern Spotting In Historical Documents
    Cheddad, Abbas
    2016 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2016,
  • [23] Text Spotting towards Perceptually Aliased Urban Place Recognition
    Hettiarachchi, Dulmini
    Tian, Ye
    Yu, Han
    Kamijo, Shunsuke
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2022, 6 (11)
  • [24] Towards Unified Scene Text Spotting based on Sequence Generation
    Kil, Taeho
    Kim, Seonghyeon
    Seo, Sukmin
    Kim, Yoonsik
    Kim, Daehee
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15223 - 15232
  • [25] PAN plus plus : Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
    Wang, Wenhai
    Xie, Enze
    Li, Xiang
    Liu, Xuebo
    Liang, Ding
    Zhibo, Yang
    Lu, Tong
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5349 - 5367
  • [26] A semantic case-based reasoning framework for text categorization
    Ceausu, Valentina
    Despres, Sylvie
    SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 736 - +
  • [27] Progressive Semantic Matching for Video-Text Retrieval
    Liu, Hongying
    Luo, Ruyi
    Shang, Fanhua
    Niu, Mantang
    Liu, Yuanyuan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5083 - 5091
  • [28] ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
    Huang, Mingxin
    Zhang, Jiaxin
    Peng, Dezhi
    Lu, Hao
    Huang, Can
    Liu, Yuliang
    Bai, Xiang
    Jin, Lianwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19438 - 19448
  • [29] An Effective Approach Towards Video Text Recognition
    Sudir, Prakash
    Ravishankar, M.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 323 - 333
  • [30] Towards Table-to-Text Generation with Numerical Reasoning
    Suadaa, Lya Hulliyyatus
    Kamigaito, Hidetaka
    Funakoshi, Kotaro
    Okumura, Manabu
    Takamura, Hiroya
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1451 - 1465