A discrete arabic script for better automatic document understanding

被引:0
|
作者
Abuhaiba, ISI [1 ]
机构
[1] Islam Univ Gaza, Dept Elect & Comp Engn, Gaza, Israel
来源
关键词
document understanding; cursive arabic script; discrete arabic script; character segmentation; TrueType font; left white space; right white space;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper lays the groundwork for the development of new fonts to produce discrete Arabic script, for the first time, instead of cursive Arabic script. These fonts help in automatic document understanding and can be used to print books, newspapers, periodicals, and all other printed materials. Of course, all other properties of Arabic writing system are preserved when producing such fonts. The history of Arabic calligraphy since its beginning provides a strong defense of our call to break the cursive law of Arabic script. We could develop new fonts for discrete Arabic typography such that the characters can be segmented with simple vertical white cuts. Two parameters are investigated to suit the new requirements: left and right white spaces. Nine A4 pages of Arabic script were used in our experiments to empirically determine a sufficient amount of these spaces. A font with left and right spaces of 160 FUnits each, achieved a segmentation success rate of 99.99%.
引用
收藏
页码:77 / 94
页数:18
相关论文
共 50 条
  • [1] Automatic detection of document script and orientation
    Lu, Shijian
    Tan, Chew Lim
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 237 - 241
  • [2] Automatic Arabic Document Classification via kNN
    HANI M. O. Iwidat
    Computer Aided Drafting,Design and Manufacturing, 2008, Design and Manufacturing.2008 (02) : 65 - 73
  • [3] The 'Arabic Script'
    Selma, T
    NORTH AMERICAN REVIEW, 2003, 288 (02): : 10 - 11
  • [4] Automatic understanding of the spontaneous Arabic speech
    Zouaghi, Anis
    Zrigui, Mounir
    Antoniadis, Georges
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2008, 49 (01): : 141 - 166
  • [5] Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line
    Pal, U
    Chaudhuri, BB
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 790 - 794
  • [6] Hierarchical content classification and script determination for automatic document image processing
    Wang, Q
    Chi, Z
    Zhao, RC
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 77 - 80
  • [7] An Approach for Automatic Indic Script Identification from Handwritten Document Images
    Obaidullah, Sk. Md.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    ADVANCED COMPUTING AND SYSTEMS FOR SECURITY, VOL 2, 2016, 396 : 37 - 51
  • [8] Hierarchical content classification and script determination for automatic document image processing
    Chi, Z
    Wang, Q
    Siu, WC
    PATTERN RECOGNITION, 2003, 36 (11) : 2483 - 2500
  • [9] Towards Complex Document Understanding By Discrete Reasoning
    Zhu, Fengbin
    Lei, Wenqiang
    Feng, Fuli
    Wang, Chao
    Zhang, Haozhou
    Chua, Tat-Seng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4857 - 4866
  • [10] THE 'DREAM OF WRITING IN ARABIC SCRIPT'
    BERNARD, K
    SALMAGUNDI-A QUARTERLY OF THE HUMANITIES AND SOCIAL SCIENCES, 1996, (109-10): : 127 - 128