Ligature-based font size independent OCR for Noori Nastalique writing style

被引:0
|
作者
Akram, Qurat Ul Ain [1 ]
Hussain, Sarmad [1 ]
机构
[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan
关键词
Urdu; Noori Nastalique; Font Size Independent; Image Resizing; Ligature; Main Body; Optical Character Recognition (OCR); RECOGNITION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, a font size independent Optical Character Recognition (OCR) system for Urdu document images is presented. Urdu documents are written using Noori Nastalique writing style with different font sizes of normal text and headings. Most of current state of the art techniques of Urdu OCRs support recognition of text having single font size. The presented study deals with the recognition of Nastalique text having 14 to 28 font sizes. Three recognizers at three font sizes(called pivot) including 14, 16 and 22 are developed. Urdu document images having remaining font sizes such as 18, 20, 24, 26 and 28 are resized to the nearest pivot font size using Nearest Neighboring interpolation technique so that it can be recognized. The detailed analysis has been carried out to compute optimal scaling factor of each font size to improve recognition results. It has been observed that recognizers perform better at resized images by applying optimal scaling factors instead of simple computed scaling factors. The system is developed and matured on 1,965 main body classes covering 59,974 high frequent Urdu words. After maturation, system has 97.20%, 97.08%, 95.13%, 95.65%, 96.26%, 96.52%, 95.78%, 96.38%, 96.66% main body recognition accuracy for 14, 16, 18, 20, 24, 26, 28 font sizes respectively.
引用
收藏
页码:129 / 133
页数:5
相关论文
共 6 条
  • [1] Developing a commercial grade Tamil OCR for recognizing font and size independent text
    Liyanage, Chamila
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2015 FIFTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2015, : 130 - 134
  • [2] A font and size-independent OCR system for printed Kannada documents using support vector machines
    Ashwin, T.V.
    Sastry, P.S.
    Sadhana - Academy Proceedings in Engineering Sciences, 2002, 27 (01) : 35 - 58
  • [3] A font and size-independent OCR system for printed Kannada documents using support vector machines
    Ashwin, TV
    Sastry, PS
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1): : 35 - 58
  • [4] A font and size-independent OCR system for printed Kannada documents using support vector machines
    T. V. Ashwin
    P. S. Sastry
    Sadhana, 2002, 27 : 35 - 58
  • [5] Prosodic mapping of text font based on the dimensional theory of emotions: a case study on style and size
    Tsonos, Dimitrios
    Kouroupetroglou, Georgios
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016,
  • [6] Prosodic mapping of text font based on the dimensional theory of emotions: a case study on style and size
    Dimitrios Tsonos
    Georgios Kouroupetroglou
    EURASIP Journal on Audio, Speech, and Music Processing, 2016