Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study

被引:2
|
作者
Xu, Miaomiao [1 ,2 ,3 ]
Zhang, Jiang [1 ]
Xu, Lianghui [1 ]
Silamu, Wushour [1 ,2 ,3 ]
Li, Yanbing [1 ,2 ,3 ]
机构
[1] Xinjiang Univ, Coll Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Lab Multilanguage Informat Technol, Urumqi 830017, Peoples R China
[3] Xinjiang Univ, Xinjiang Multilingual Informat Technol Res Ctr, Urumqi 830017, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 05期
基金
中国国家自然科学基金;
关键词
scene text recognition; low-resource languages; collaborative encoding; data augmentation;
D O I
10.3390/app14051707
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Current research on scene text recognition primarily focuses on languages with abundant linguistic resources, such as English and Chinese. In contrast, there is relatively limited research dedicated to low-resource languages. Advanced methods for scene text recognition often employ Transformer-based architectures. However, the performance of Transformer architectures is suboptimal when dealing with low-resource datasets. This paper proposes a Collaborative Encoding Method for Scene Text Recognition in the low-resource Uyghur language. The encoding framework comprises three main modules: the Filter module, the Dual-Branch Feature Extraction module, and the Dynamic Fusion module. The Filter module, consisting of a series of upsampling and downsampling operations, performs coarse-grained filtering on input images to reduce the impact of scene noise on the model, thereby obtaining more accurate feature information. The Dual-Branch Feature Extraction module adopts a parallel structure combining Transformer encoding and Convolutional Neural Network (CNN) encoding to capture local and global information. The Dynamic Fusion module employs an attention mechanism to dynamically merge the feature information obtained from the Transformer and CNN branches. To address the scarcity of real data for natural scene Uyghur text recognition, this paper conducted two rounds of data augmentation on a dataset of 7267 real images, resulting in 254,345 and 3,052,140 scene images, respectively. This process partially mitigated the issue of insufficient Uyghur language data, making low-resource scene text recognition research feasible. Experimental results demonstrate that the proposed collaborative encoding approach achieves outstanding performance. Compared to baseline methods, our collaborative encoding approach improves accuracy by 14.1%.
引用
收藏
页数:16
相关论文
共 15 条
  • [1] Hybrid Encoding Method for Scene Text Recognition in Low-Resource Uyghur
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Li, Yanbing
    Silamu, Wushour
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 86 - 99
  • [2] Dual Feature Enhanced Scene Text Recognition Method for Low-Resource Uyghur
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Li, Yanbing
    Silamu, Wushour
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 58 - 71
  • [3] Correlation-guided decoding strategy for low-resource Uyghur scene text recognition
    Xu, Miaomiao
    Zhang, Jiang
    Xu, Lianghui
    Silamu, Wushour
    Li, Yanbing
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [4] Text Representation and Similarity Measure for Text Clustering Based on Semantic Strings: A Case Study on Uyghur Language
    Tohti, Turdi
    Tan, Xing
    Huang, Jimmy
    Hamdulla, Askar
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (03): : 339 - 350
  • [5] Text Filtering through Multi-Pattern Matching: A Case Study of Wu-Manber-Uy on the Language of Uyghur
    Tohti, Turdi
    Huang, Jimmy
    Hamdulla, Askar
    Tan, Xing
    INFORMATION, 2019, 10 (08)
  • [7] Transfer learning methods for low-resource speech accent recognition: A case study on Vietnamese language
    Ta, Bao Thang
    Le, Nhat Minh
    Do, Van Hai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [8] Tracking Anglicisms in Domains by the Corpus-Linguistic Method-A Case Study of Financial Language in Stock Blogs and Stock Analyses
    Laursen, Anne Lise
    Mousten, Birthe
    2015 IEEE INTERNATIONAL PROFESSIONAL COMMUNICATION CONFERENCE (IPCC), 2015,
  • [9] Site selection decision framework for biomass pyrolysis project based on a mixed method under probabilistic linguistic environment and low carbon perspective: A case study in China
    Yang, Shi-Guan
    Zhou, Jia-le
    Hu, Zhuang
    Zhou, Xin-yue
    Cai, Qi
    Xie, Jin-heng
    Wu, Yang-wen
    Lu, Qiang
    ENERGY, 2023, 272
  • [10] New Extension of Fuzzy-Weighted Zero-Inconsistency and Fuzzy Decision by Opinion Score Method Based on Cubic Pythagorean Fuzzy Environment: A Benchmarking Case Study of Sign Language Recognition Systems
    Alamoodi, A. H.
    Albahri, O. S.
    Zaidan, A. A.
    AlSattar, H. A.
    Ahmed, Mohamed A.
    Pamucar, Dragan
    Zaidan, B. B.
    Albahri, A. S.
    Mahmoud, Mohammed S.
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2022, 24 (04) : 1909 - 1926