A structure for annotation and ground-truthing of Urdu handwritten text image corpus

被引:1
|
作者
Choudhary, Prakash [1 ]
Nain, Neeta [2 ]
Ahmed, Mushtaq [2 ]
机构
[1] Natl Inst Technol Manipur, Imphal 795001, Manipur, India
[2] Malaviya Natl Inst Technol Jaipur, Jaipur 302017, Rajasthan, India
关键词
Urdu Corpus; Annotation; Groundtruthing; Handwritten Documents; Documents Analysis;
D O I
10.1016/j.sbspro.2015.07.422
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Over the last few decades, a large evolution has been made in the field of handwritten recognition. Material of handwritten documents is become less with current trends of digital electronics. However, for the investigation and research on a particular language a large volume of handwritten documents database is required. In this paper we describe our approach for development a large volume of Urdu handwritten text images Corpus on Urdu language. To make the database available in large field of Natural Language Processing we annotate database for each image and associate a XML based ground-truth Meta information to make it computer compatible as a linguistic resource. This paper focus on the some issue related with Corpus design and annotation such as data collection, writers selection, methodology of annotation etc. (C) 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
引用
收藏
页码:84 / 88
页数:5
相关论文
共 6 条
  • [1] An Annotated Urdu Corpus of Handwritten Text Image and Benchmarking of Corpus
    Choudhary, Prakash
    Nain, Neeta
    2014 37TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2014, : 1159 - 1164
  • [2] POS Tagging and Structural Annotation of Handwritten Text Image Corpus of Devnagari Script
    Nehral, Maninder Singh
    Nainl, Neeta
    Ahmed, Mushtaq
    Modi, Deepa
    EMERGING TECHNOLOGIES IN COMPUTER ENGINEERING: MICROSERVICES IN BIG DATA ANALYTICS, 2019, 985 : 286 - 297
  • [3] Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments
    Clausner, C.
    Pletschacher, S.
    Antonacopoulos, A.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 48 - 52
  • [4] CALAM: Linguistic Structure to Annotate Handwritten Text Image Corpus
    Choudhary, Prakash
    Nain, Neeta
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 2, 2015, 32 : 449 - 460
  • [6] News Image Annotation on a Large Parallel Text-Image Corpus
    Tirilly, Pierre
    Claveau, Vincent
    Gros, Patrick
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,