Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

被引：0

作者：

Zu, Xinyan ^{[1
]}

Yu, Haiyang ^{[1
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For 'visually', we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For 'linguistically', a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For 'semantically', we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.

引用

页码：1858 / 1866

页数：9

共 50 条

[1] TOWARDS ACCURATE INSTANCE-LEVEL TEXT SPOTTING WITH GUIDED ATTENTION
Wang, Haiyan
Rong, Xuejian
Tian, Yingli
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 994 - 999
[2] Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Feng, Zerun
Zeng, Zhimin
Guo, Caili
Li, Zheng
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1005 - 1011
[3] You Only Recognize Once: Towards Fast Video Text Spotting
Cheng, Zhanzhan
Lu, Jing
Niu, Yi
Pu, Shiliang
Wu, Fei
Zhou, Shuigeng
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 855 - 863
[4] Textual Visual Semantic Dataset for Text Spotting
Sabir, Ahmed
Moreno-Noguer, Francesc
Padro, Lluis
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2306 - 2315
[5] Balanced Synthetic Data for Accurate Scene Text Spotting
Yao, Ying
Huang, Zhangjin
TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2018), 2018, 10806
[6] ADVMIX: DATA AUGMENTATION FOR ACCURATE SCENE TEXT SPOTTING
Huang, Yizhang
Fang, Kun
Huang, Xiaolin
Yang, Jie
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 954 - 958
[7] ICDAR 2021 Competition on Scene Video Text Spotting
Cheng, Zhanzhan
Lu, Jing
Zou, Baorui
Zhou, Shuigeng
Wu, Fei
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 650 - 662
[8] End-to-End Video Text Spotting with Transformer
Wu, Weijia
Cai, Yuanqiang
Shen, Chunhua
Zhang, Debing
Fu, Ying
Zhou, Hong
Luo, Ping
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
[9] Online Reasoning for Semantic Error Detection in Text
Gutierrez F.
Dou D.
de Silva N.
Fickas S.
Dou, Dejing (dou@cs.uoregon.edu), 1600, Springer Science and Business Media Deutschland GmbH (06): : 139 - 153
[10] A Bilingual, Open World Video Text Dataset and Real-Time Video Text Spotting With Contrastive Learning
Wu, Weijia
Li, Zhuang
Cai, Yuanqiang
Zhou, Hong
Zheng Shou, Mike
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 534 - 546

← 1 2 3 4 5 →