DPGS: Cross-cooperation guided dynamic points generation for scene text spotting

被引:0
|
作者
Sun, Wei [1 ,3 ]
Wang, Qianzhou [1 ]
Hou, Zhiqiang [1 ]
Chen, Xueling [1 ,3 ]
Yan, Qingsen [2 ,3 ]
Zhang, Yanning [2 ,3 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China
[3] Natl Engn Lab Integrated Aerosp Ground Ocean Big D, Beijing, Peoples R China
关键词
Scene text detection and recognition; Coarse-to-fine; Cross-cooperative learning; K-NN search;
D O I
10.1016/j.knosys.2024.112399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to- fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.
引用
收藏
页数:11
相关论文
共 11 条
  • [1] Towards Unified Scene Text Spotting based on Sequence Generation
    Kil, Taeho
    Kim, Seonghyeon
    Seo, Sukmin
    Kim, Yoonsik
    Kim, Daehee
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15223 - 15232
  • [2] WACNET:WORD SEGMENTATION GUIDED CHARACTERS AGGREGATION NET FOR SCENE TEXT SPOTTING WITH ARBITRARY SHAPES
    Gao, Yuting
    Huang, Zheng
    Dai, Yuchen
    Chen, Kai
    Guo, Jie
    Qiu, Weidong
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3382 - 3386
  • [3] FOLLOW THE CURVE: ARBITRARILY ORIENTED SCENE TEXT DETECTION USING KEY POINTS SPOTTING AND CURVE PREDICTION
    Yuan, Ke
    He, Dafang
    Yang, Xiao
    Tang, Zhi
    Wei, Daniel
    Giles, C. Lee
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [4] BPDO:BOUNDARY POINTS DYNAMIC OPTIMIZATION FOR ARBITRARY SHAPE SCENE TEXT DETECTION
    Zheng, Jinzhi
    Zhang, Libo
    Wu, Yanjun
    Zhao, Chen
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5345 - 5349
  • [5] ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions
    Zhang, Yipeng
    Wang, Xin
    Chen, Hong
    Qin, Chenyang
    Hao, Yibo
    Mei, Hong
    Zhu, Wenwu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [6] Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling
    Zhang, Shi-Xue
    Yang, Chun
    Zhu, Xiaobin
    Zhou, Hongyang
    Wang, Hongfa
    Yin, Xu-Cheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 825 - 839
  • [7] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer
    Ye, Maoyuan
    Zhang, Jing
    Zhao, Shanshan
    Liu, Juhua
    Du, Bo
    Tao, Dacheng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3241 - 3249
  • [8] A Unified Approach for Text- and Image-guided 4D Scene Generation
    Zheng, Yufeng
    Li, Xueting
    Nagano, Koki
    Liu, Sifei
    Hilliges, Otmar
    De Mello, Shalini
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7300 - 7309
  • [9] 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
    Jiang, Zutao
    Lu, Guansong
    Liang, Xiaodan
    Zhu, Jihua
    Zhang, Wei
    Chang, Xiaojun
    Xu, Hang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1051 - 1059
  • [10] Cross-modal Generation and Alignment via Attribute-guided Prompt for Unsupervised Text-Based Person Retrieval
    Li, Zongyi
    Li, Jianbo
    Shi, Yuxuan
    Ling, Hefei
    Chen, Jiazhong
    Wang, Runsheng
    Huang, Shijuan
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1047 - 1055