QATIP - An Optical Character Recognition System for Arabic Heritage Collections in Libraries

被引:6
|
作者
Stahlberg, Felix [1 ]
Vogel, Stephan [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Tornado Tower,18th Floor, Doha, Qatar
来源
PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016) | 2016年
关键词
D O I
10.1109/DAS.2016.81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, commercial optical character recognition (OCR) software achieves very high accuracy on high-quality scans of modern Arabic documents. However, a large fraction of Arabic heritage collections in libraries is usually more challenging - e.g. consisting of typewritten documents, early prints, and historical manuscripts. In this paper, we present our end-user oriented QATIP system for OCR in such documents. The recognition is based on the Kaldi toolkit and sophisticated text image normalization. This paper contains two main contributions: First, we describe the QATIP interface for libraries which consists of both a graphical user interface for adding and monitoring jobs and a web API for automated access. Second, we suggest novel approaches for language modelling and ligature modelling for continuous Arabic OCR. We test our QATIP system on an early print and a historical manuscript and report substantial improvements - e.g. 12.6% character error rate with QATIP compared to 51.8% with the best OCR product in our experimental setup (Tesseract).
引用
收藏
页码:168 / 173
页数:6
相关论文
共 50 条
  • [1] Automated System for Arabic Optical Character Recognition
    Aljarrah, Inad
    Al-Khaleel, Osama
    Mhaidat, Khaldoon
    Alrefai, Mu'ath
    Alzu'bi, Abdullah
    Rabab'ah, Mohammad
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS'12), 2012,
  • [2] A recognition-based Arabic optical character recognition system
    Cheung, A
    Bennamoun, M
    Bergmann, NW
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4189 - 4194
  • [3] Arabic Optical Character Recognition: A Review
    Alghyaline, Salah
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (03): : 1825 - 1861
  • [4] An Arabic optical character recognition system using recognition-based segmentation
    Cheung, A
    Bennamoun, M
    Bergmann, NW
    PATTERN RECOGNITION, 2001, 34 (02) : 215 - 233
  • [5] Arabic Character Recognition System Development
    Supriana, Iping
    Nasution, Albadr
    4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013), 2013, 11 : 334 - 341
  • [6] INTELLIGENT SYSTEM for ARABIC CHARACTER RECOGNITION
    Albakoor, M.
    Saeed, K.
    Sukkar, F.
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 981 - +
  • [7] Offline Arabic character recognition system
    黄建华
    唐降龙
    Journal of Harbin Institute of Technology, 2003, (01) : 80 - 88
  • [8] Design of an Embedded Arabic Optical Character Recognition
    A. Al-Marakeby
    F. Kimura
    M. Zaki
    A. Rashid
    Journal of Signal Processing Systems, 2013, 70 : 249 - 258
  • [9] Design of an Embedded Arabic Optical Character Recognition
    Al-Marakeby, A.
    Kimura, F.
    Zaki, M.
    Rashid, A.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2013, 70 (03): : 249 - 258
  • [10] Optical Character Recognition of Arabic Printed Text
    Taha, Safwa
    Babiker, Yusra
    Abbas, Mohamed
    2012 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2012,