DPGS: Cross-cooperation guided dynamic points generation for scene text spotting

被引：0

作者：

Sun, Wei ^{[1
,3
]}

Wang, Qianzhou ^{[1
]}

Hou, Zhiqiang ^{[1
]}

Chen, Xueling ^{[1
,3
]}

Yan, Qingsen ^{[2
,3
]}

Zhang, Yanning ^{[2
,3
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China

[3] Natl Engn Lab Integrated Aerosp Ground Ocean Big D, Beijing, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 302卷

关键词：

Scene text detection and recognition; Coarse-to-fine; Cross-cooperative learning; K-NN search;

D O I：

10.1016/j.knosys.2024.112399

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to- fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.

引用

页数：11

共 11 条

[1] Towards Unified Scene Text Spotting based on Sequence Generation
Kil, Taeho
Kim, Seonghyeon
Seo, Sukmin
Kim, Yoonsik
Kim, Daehee
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15223 - 15232
[2] WACNET:WORD SEGMENTATION GUIDED CHARACTERS AGGREGATION NET FOR SCENE TEXT SPOTTING WITH ARBITRARY SHAPES
Gao, Yuting
Huang, Zheng
Dai, Yuchen
Chen, Kai
Guo, Jie
Qiu, Weidong
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3382 - 3386
[3] FOLLOW THE CURVE: ARBITRARILY ORIENTED SCENE TEXT DETECTION USING KEY POINTS SPOTTING AND CURVE PREDICTION
Yuan, Ke
He, Dafang
Yang, Xiao
Tang, Zhi
Wei, Daniel
Giles, C. Lee
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[4] BPDO:BOUNDARY POINTS DYNAMIC OPTIMIZATION FOR ARBITRARY SHAPE SCENE TEXT DETECTION
Zheng, Jinzhi
Zhang, Libo
Wu, Yanjun
Zhao, Chen
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5345 - 5349
[5] ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions
Zhang, Yipeng
Wang, Xin
Chen, Hong
Qin, Chenyang
Hao, Yibo
Mei, Hong
Zhu, Wenwu
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[6] Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling
Zhang, Shi-Xue
Yang, Chun
Zhu, Xiaobin
Zhou, Hongyang
Wang, Hongfa
Yin, Xu-Cheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 825 - 839
[7] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer
Ye, Maoyuan
Zhang, Jing
Zhao, Shanshan
Liu, Juhua
Du, Bo
Tao, Dacheng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3241 - 3249
[8] A Unified Approach for Text- and Image-guided 4D Scene Generation
Zheng, Yufeng
Li, Xueting
Nagano, Koki
Liu, Sifei
Hilliges, Otmar
De Mello, Shalini
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7300 - 7309
[9] 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
Jiang, Zutao
Lu, Guansong
Liang, Xiaodan
Zhu, Jihua
Zhang, Wei
Chang, Xiaojun
Xu, Hang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1051 - 1059
[10] Cross-modal Generation and Alignment via Attribute-guided Prompt for Unsupervised Text-Based Person Retrieval
Li, Zongyi
Li, Jianbo
Shi, Yuxuan
Ling, Hefei
Chen, Jiazhong
Wang, Runsheng
Huang, Shijuan
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1047 - 1055

← 1 2 →