Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

被引：0

作者：

Zu, Xinyan ^{[1
]}

Yu, Haiyang ^{[1
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Sch Comp Sci, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For 'visually', we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For 'linguistically', a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For 'semantically', we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.

引用

页码：1858 / 1866

页数：9

共 50 条

[41] Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals
Jin, Lu
Li, Zechao
Tang, Jinhui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 1838 - 1851
[42] Multi-modal graph reasoning for structured video text extraction
Shi, Weitao
Wang, Han
Lou, Xin
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 107
[43] Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation
Liu, Yue
Shi, Ying
Lin, Chaojun
Hua, Jie
Huang, Ziqi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 630 - 642
[44] Towards patent text analysis based on semantic role labelling
He Y.
Li Y.
Meng L.
Xu H.
He, Yanqing (heyq@istic.ac.cn), 2017, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (15) : 256 - 266
[45] Focusing Attention: Towards Accurate Text Recognition in Natural Images
Cheng, Zhanzhan
Bai, Fan
Xu, Yunlu
Zheng, Gang
Pu, Shiliang
Zhou, Shuigeng
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
[46] Piece-wise linearity based method for text frame classification in video
Sharma, Nabin
Shivakumara, Palaiahnakote
Pal, Umapada
Blumenstein, Michael
Tan, Chew Lim
PATTERN RECOGNITION, 2015, 48 (03) : 862 - 881
[47] Level-wise aligned dual networks for text-video retrieval
Lin, Qiubin
Cao, Wenming
He, Zhiquan
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2022, 2022 (01)
[48] Image-Text Embedding Learning via Visual and Textual Semantic Reasoning
Li, Kunpeng
Zhang, Yulun
Li, Kai
Li, Yuanyuan
Fu, Yun
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 641 - 656
[49] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
Bhunia, Ayan Kumar
Sain, Aneeshan
Kumar, Amandeep
Ghose, Shuvozit
Chowdhury, Pinaki Nath
Song, Yi-Zhe
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14920 - 14929
[50] Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Kittenplon, Yair
Lavi, Inbal
Fogel, Sharon
Bar, Yarin
Manmatha, R.
Perona, Pietro
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4594 - 4603

← 1 2 3 4 5 →