A discrete arabic script for better automatic document understanding

被引:0
|
作者
Abuhaiba, ISI [1 ]
机构
[1] Islam Univ Gaza, Dept Elect & Comp Engn, Gaza, Israel
来源
关键词
document understanding; cursive arabic script; discrete arabic script; character segmentation; TrueType font; left white space; right white space;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper lays the groundwork for the development of new fonts to produce discrete Arabic script, for the first time, instead of cursive Arabic script. These fonts help in automatic document understanding and can be used to print books, newspapers, periodicals, and all other printed materials. Of course, all other properties of Arabic writing system are preserved when producing such fonts. The history of Arabic calligraphy since its beginning provides a strong defense of our call to break the cursive law of Arabic script. We could develop new fonts for discrete Arabic typography such that the characters can be segmented with simple vertical white cuts. Two parameters are investigated to suit the new requirements: left and right white spaces. Nine A4 pages of Arabic script were used in our experiments to empirically determine a sufficient amount of these spaces. A font with left and right spaces of 160 FUnits each, achieved a segmentation success rate of 99.99%.
引用
收藏
页码:77 / 94
页数:18
相关论文
共 50 条
  • [22] THE GENESIS OF ARABIC SCRIPT FOR DAGHESTANI LANGUAGES
    Isaev, Amirkhan
    WRITTEN CULTURE IN DAGHESTAN, 2015, 369 : 69 - 74
  • [23] The Arabic Citation Index: Toward a better understanding of Arab scientific literature
    El-Ouahi, Jamal
    QUANTITATIVE SCIENCE STUDIES, 2023, 4 (03): : 728 - 755
  • [24] Maryam: A Woman of Bethlehem: A Play Foreword, with English Script and Arabic Script
    Rue, Victoria
    ECUMENICA-PERFORMANCE AND RELIGION, 2019, 12 (02): : 83 - 127
  • [25] Better Understanding the Costs and Benefits of Automatic Memory Management
    Sareen, Kunal
    Blackburn, Stephen M.
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON MANAGED PROGRAMMING LANGUAGES AND RUNTIMES, MPLR 2022, 2022, : 29 - 44
  • [26] PaperDiff: A Script Independent Automatic Method for Finding The Text Differences Between Two Document Images
    Ramachandrula, Sitaram
    Joshi, Gopal Datt
    Noushath, S.
    Parikh, Pulkit
    Guptat, Vishal
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 585 - 590
  • [27] Automatic Training Set Generation for Better Historic Document Transcription and Compression
    Silva, Gabriel de Frana Pereira E.
    Lins, Rafael Dueire
    Gomes, Cesar
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 277 - 281
  • [28] Transliteration of Arabizi into Arabic Script for Tunisian Dialect
    Masmoudi, Abir
    Khmekhem, Mariem Ellouze
    Khrouf, Mourad
    Belguith, Lamia Hadrich
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [29] Arabic Script Based Character Segmentation: A review
    Naz, Saeeda
    Hayat, Khizar
    Razzak, Muhammad Imran
    Anwar, Muhammad Waqas
    Akbar, Habib
    WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [30] Heuristic approach to the recognition of printed Arabic script
    Obaid, AM
    Dobrowiecki, TP
    INES'97 : 1997 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, PROCEEDINGS, 1997, : 197 - 201