Weakly supervised scene text generation for low-resource languages

被引：3

作者：

Xie, Yangchen ^{[1
]}

Chen, Xinyuan ^{[2
]}

Zhan, Hongjian ^{[1
,3
]}

Shivakumara, Palaiahnakote ^{[4
]}

Yin, Bing ^{[5
]}

Liu, Cong ^{[5
]}

Lu, Yue ^{[1
]}

机构：

[1] East China Normal Univ, Sch Commun & Elect Engn, Shanghai 200241, Peoples R China

[2] Shanghai AI Lab, Shanghai, Peoples R China

[3] Chongqing Inst East China Normal Univ, Chongqing Key Lab Precis Opt, Chongqing 401120, Peoples R China

[4] Univ Malaya, Fac Comp Sci & Informat Technol FSKTM, Kuala Lumpur, Malaysia

[5] iFLYTEK Res, Hefei, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 237卷

关键词：

Scene text generation; Style transfer; Low-resource languages;

D O I：

10.1016/j.eswa.2023.121622

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages. In this paper, we propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision. The proposed method can generate a large amount of scene text images with diverse backgrounds and font styles through cross -language generation. Our method disentangles the content and style features of scene text images, with the former representing textual information and the latter representing characteristics such as font, alignment, and background. To preserve the complete content structure of generated images, we introduce an integrated attention module. Furthermore, to bridge the style gap in the style of different languages, we incorporate a pre-trained font classifier. We evaluate our method using state-of-the-art scene text recognition models. Experiments demonstrate that our generated scene text significantly improves the scene text recognition accuracy and helps achieve higher accuracy when complemented with other generative methods.

引用

页数：12

共 50 条

[1] Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages
Kann, Katharina
Lacroix, Ophelie
Sogaard, Anders
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8066 - 8073
[2] Efficient Entity Candidate Generation for Low-Resource Languages
Garcia-Duran, Alberto
Arora, Akhil
West, Robert
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6429 - 6438
[3] XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
Abhishek, Tushar
Sagare, Shivprasad
Singh, Bhavyajeet
Sharma, Anubhav
Gupta, Manish
Varma, Vasudeva
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 171 - 175
[4] Hybrid Approach Text Generation for Low-Resource Language
Rakhimova, Diana
Adali, Esref
Karibayeva, Aidana
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 256 - 268
[5] Hybrid Encoding Method for Scene Text Recognition in Low-Resource Uyghur
Xu, Miaomiao
Zhang, Jiang
Xu, Lianghui
Li, Yanbing
Silamu, Wushour
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 86 - 99
[6] Harnessing Knowledge Distillation for Enhanced Text-to-Text Translation in Low-Resource Languages
Ahmed, Manar Ouled
Ming, Zuheng
Othmani, Alice
SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 295 - 307
[7] Exploring low-resource medical image classification with weakly supervised prompt learning
Zheng, Fudan
Cao, Jindong
Yu, Weijiang
Chen, Zhiguang
Xiao, Nong
Lu, Yutong
PATTERN RECOGNITION, 2024, 149
[8] SUPERVISED AND UNSUPERVISED ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGES
Syed, Ali Raza
Rosenberg, Andrew
Kislal, Ellen
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5320 - 5324
[9] Weakly Supervised Attention Rectification for Scene Text Recognition
Gu, Chengyu
Wang, Shilin
Zhu, Yiwei
Huang, Zheng
Chen, Kai
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 779 - 786
[10] Dual Feature Enhanced Scene Text Recognition Method for Low-Resource Uyghur
Xu, Miaomiao
Zhang, Jiang
Xu, Lianghui
Li, Yanbing
Silamu, Wushour
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 58 - 71

← 1 2 3 4 5 →