Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

被引:0
|
作者
Cheema M.D.A. [1 ]
Shaiq M.D. [1 ]
Mirza F. [2 ]
Kamal A. [1 ]
Naeem M.A. [1 ]
机构
[1] Department of Artificial Intelligence and Data Science, National University of Computer and Emerging Sciences, Islamabad
[2] School of Computer, Engineering and Mathematical Sciences, Auckland University of Technology, Auckland
关键词
Document analysis; Multilingual; OCR; Performance evaluation; Transformer based models; Urdu OCR;
D O I
10.7717/PEERJ-CS.1964
中图分类号
O43 [光学]; T [工业技术];
学科分类号
070207 ; 08 ; 0803 ;
摘要
In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves stateof- the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the-art baselines in Urdu handwriting digitization. © (2023) PeerJ Inc. All Rights Reserved.
引用
收藏
页码:1 / 24
页数:23
相关论文
共 50 条
  • [21] Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition
    Zhou, Shiyu
    Zhao, Yuanyuan
    Xu, Shuang
    Xu, Bo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 704 - 708
  • [22] Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
    Kulshreshtha, Devang
    Dingliwal, Saket
    Houston, Brady
    Bodapati, Sravan
    INTERSPEECH 2023, 2023, : 3302 - 3306
  • [23] Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
    Pineiro-Martin, Andres
    Garcia-Mateo, Carmen
    Docio-Fernandez, Laura
    Del Carmen Lopez-Perez, Maria
    Rehm, Georg
    INTERSPEECH 2024, 2024, : 1235 - 1239
  • [24] Toward a Low-Resource Non-Latin-Complete Baseline: An Exploration of Khmer Optical Character Recognition
    Buoy, Rina
    Iwamura, Masakazu
    Srun, Sovila
    Kise, Koichi
    IEEE ACCESS, 2023, 11 : 128044 - 128060
  • [25] Language fusion via adapters for low-resource speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Liang, Xiuxia
    SPEECH COMMUNICATION, 2024, 158
  • [26] Entropy-guided Vocabulary Augmentation of Multilingual Language Models for Low-resource Tasks
    Nag, Arijit
    Samanta, Bidisha
    Mukherjee, Animesh
    Ganguly, Niloy
    Chakrabarti, Soumen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8619 - 8629
  • [27] Google Tesseract: Optical Character Recognition (OCR) on HDD/SSD Labels Using Machine Vision
    Estrada Bugayong, Vernon
    Flores Villaverde, Jocelyn
    Linsangan, Noel B.
    2022 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2022), 2022, : 56 - 60
  • [28] End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
    Li, Sheng
    Ding, Chenchen
    Lu, Xugang
    Shen, Peng
    Kawahara, Tatsuya
    Kawai, Hisashi
    INTERSPEECH 2019, 2019, : 2145 - 2149
  • [29] Task-based Meta Focal Loss for Multilingual Low-resource Speech Recognition
    Chen, Yaqi
    Zhang, Wenlin
    Zhang, Hao
    Qu, Dan
    Yang, Xu-Kui
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [30] Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
    Ghosh, Anirudha
    Barman, Debaditya
    Sufian, Abu
    Hameed, Ibrahim A.
    IEEE ACCESS, 2024, 12 : 189651 - 189666