Fine-grained Pseudo Labels for Scene Text Recognition

被引：0

作者：

Li, Xiaoyu ^{[1
]}

Chen, Xiaoxue ^{[1
]}

Huang, Zuming ^{[1
]}

Xie, Lele ^{[1
]}

Chen, Jingdong ^{[1
]}

Yang, Ming ^{[1
]}

机构：

[1] Ant Grp, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

pseudo labels; domain shift; scene text recognition;

D O I：

10.1145/3581783.3611791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pseudo-Labeling based semi-supervised learning has shown promising advantages in Scene Text Recognition (STR). Most of them usually use a pre-trained model to generate sequence-level pseudo labels for text images and then re-train the model. Recently, conducting Pseudo-Labeling in a teacher-student framework (a student model is supervised by the pseudo labels from a teacher model) has become increasingly popular, which trains in an end-to-end manner and yields outstanding performance in semi-supervised learning. However, applying this framework directly to Pseudo-Labeling STR exhibits unstable convergence, as generating pseudo labels at the coarse-grained sequence-level leads to inefficient utilization of unlabelled data. Furthermore, the inherent domain shift between labeled and unlabeled data results in low quality of derived pseudo labels. To mitigate the above issues, we propose a novel Cross-domain Pseudo-Labeling (CPL) approach for scene text recognition, which makes better utilization of unlabeled data at the character-level and provides more accurate pseudo labels. Specifically, our proposed Pseudo-Labeled Curriculum Learning dynamically adjusts the thresholds for different character classes according to the model's learning status. Moreover, an Adaptive Distribution Regularizer is employed to bridge the domain gap and improve the quality of pseudo labels. Extensive experiments show that CPL boosts those representative STR models to achieve state-of-the-art results on six challenging STR benchmarks. Besides, it can be effectively generalized to handwritten text.

引用

页码：5786 / 5795

页数：10

共 50 条

[31] A dataset for fine-grained seed recognition
Yuan, Min
Lv, Ningning
Dong, Yongkang
Hu, Xiaowen
Lu, Fuxiang
Zhan, Kun
Shen, Jiacheng
Wu, Xiaolin
Zhu, Liye
Xie, Yufei
SCIENTIFIC DATA, 2024, 11 (01)
[32] Towards Fine-grained Text Sentiment Transfer
Luo, Fuli
Li, Peng
Yang, Pengcheng
Zhou, Jie
Tan, Yutong
Chang, Baobao
Sui, Zhifang
Sun, Xu
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2013 - 2022
[33] Fine-Grained Scene Graph Generation with Data Transfer
Zhang, Ao
Yao, Yuan
Chen, Qianyu
Ji, Wei
Liu, Zhiyuan
Sun, Maosong
Chua, Tat-Seng
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 409 - 424
[34] Fine-Grained Predicates Learning for Scene Graph Generation
Lyu, Xinyu
Gao, Lianli
Guo, Yuyu
Zhao, Zhou
Huang, Hao
Shen, Heng Tao
Song, Jingkuan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19445 - 19453
[35] Grafit: Learning fine-grained image representations with coarse labels
Touvron, Hugo
Sablayrolles, Alexandre
Douze, Matthijs
Cord, Matthieu
Jegou, Herve
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 854 - 864
[36] Weakly labeled fine-grained classification with hierarchy relationship of fine and coarse labels
Jiao, Qihan
Liu, Zhi
Ye, Linwei
Wang, Yang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 63
[37] Fine-Grained Facial Expression Recognition in the Wild
Liang, Liqian
Lang, Congyan
Li, Yidong
Feng, Songhe
Zhao, Jian
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 482 - 494
[38] Fine-Grained Named Entity Recognition for Sinhala
Azeez, Rameela
Ranathunga, Surangika
MERCON 2020: 6TH INTERNATIONAL MULTIDISCIPLINARY MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2020, : 295 - 300
[39] PROGRESSIVE TRAINING ENABLED FINE-GRAINED RECOGNITION
Kang, Bin
Wu, Fan
Li, Xin
Zhou, Quan
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 876 - 880
[40] TaiChi: A Fine-Grained Action Recognition Dataset
Sun, Shan
Wang, Feng
Liang, Qi
He, Liang
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 434 - 438

← 1 2 3 4 5 →