Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

被引：0

作者：

Cheema M.D.A. ^{[1
]}

Shaiq M.D. ^{[1
]}

Mirza F. ^{[2
]}

Kamal A. ^{[1
]}

Naeem M.A. ^{[1
]}

机构：

[1] Department of Artificial Intelligence and Data Science, National University of Computer and Emerging Sciences, Islamabad

[2] School of Computer, Engineering and Mathematical Sciences, Auckland University of Technology, Auckland

来源：

PeerJ Computer Science | 2024年 / 10卷

关键词：

Document analysis; Multilingual; OCR; Performance evaluation; Transformer based models; Urdu OCR;

D O I：

10.7717/PEERJ-CS.1964

中图分类号：

O43 [光学]; T [工业技术];

学科分类号：

070207 ; 08 ; 0803 ;

摘要：

In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language models to achieve superior performances. The proposed approach is evaluated using the character error rate (CER) metric and achieves stateof- the-art results on the Urdu UHWR dataset, with a CER of 1.1%. The experimental results demonstrate the effectiveness of the proposed approach, surpassing state of the-art baselines in Urdu handwriting digitization. © (2023) PeerJ Inc. All Rights Reserved.

引用

页码：1 / 24

页数：23

共 50 条

[21] Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition
Zhou, Shiyu
Zhao, Yuanyuan
Xu, Shuang
Xu, Bo
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 704 - 708
[22] Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Kulshreshtha, Devang
Dingliwal, Saket
Houston, Brady
Bodapati, Sravan
INTERSPEECH 2023, 2023, : 3302 - 3306
[23] Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Pineiro-Martin, Andres
Garcia-Mateo, Carmen
Docio-Fernandez, Laura
Del Carmen Lopez-Perez, Maria
Rehm, Georg
INTERSPEECH 2024, 2024, : 1235 - 1239
[24] Toward a Low-Resource Non-Latin-Complete Baseline: An Exploration of Khmer Optical Character Recognition
Buoy, Rina
Iwamura, Masakazu
Srun, Sovila
Kise, Koichi
IEEE ACCESS, 2023, 11 : 128044 - 128060
[25] Language fusion via adapters for low-resource speech recognition
Hu, Qing
Zhang, Yan
Zhang, Xianlei
Han, Zongyu
Liang, Xiuxia
SPEECH COMMUNICATION, 2024, 158
[26] Entropy-guided Vocabulary Augmentation of Multilingual Language Models for Low-resource Tasks
Nag, Arijit
Samanta, Bidisha
Mukherjee, Animesh
Ganguly, Niloy
Chakrabarti, Soumen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8619 - 8629
[27] Google Tesseract: Optical Character Recognition (OCR) on HDD/SSD Labels Using Machine Vision
Estrada Bugayong, Vernon
Flores Villaverde, Jocelyn
Linsangan, Noel B.
2022 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2022), 2022, : 56 - 60
[28] End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
Li, Sheng
Ding, Chenchen
Lu, Xugang
Shen, Peng
Kawahara, Tatsuya
Kawai, Hisashi
INTERSPEECH 2019, 2019, : 2145 - 2149
[29] Task-based Meta Focal Loss for Multilingual Low-resource Speech Recognition
Chen, Yaqi
Zhang, Wenlin
Zhang, Hao
Qu, Dan
Yang, Xu-Kui
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
[30] Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
Ghosh, Anirudha
Barman, Debaditya
Sufian, Abu
Hameed, Ibrahim A.
IEEE ACCESS, 2024, 12 : 189651 - 189666

← 1 2 3 4 5 →