Layout Analysis for Arabic Historical Document Images Using Machine Learning

被引:43
|
作者
Bukhari, Syed Saqib [1 ]
Breuel, Thomas M. [1 ]
Asi, Abedelkadir [2 ]
El-Sana, Jihad [2 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
[2] Ben Gurion Univ Negev, Negev, Israel
基金
以色列科学基金会;
关键词
SEGMENTATION;
D O I
10.1109/ICFHR.2012.227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multi-layer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
引用
收藏
页码:639 / 644
页数:6
相关论文
共 50 条
  • [11] Historical document layout analysis using anisotropic diffusion and geometric features
    Galal M. BinMakhashen
    Sabri A. Mahmoud
    International Journal on Digital Libraries, 2020, 21 : 329 - 342
  • [12] Logical Labeling of document images using layout graph matching with adaptive learning
    Liang, J
    Doermann, D
    DOCUMENT ANALYSIS SYSTEM V, PROCEEDINGS, 2002, 2423 : 224 - 235
  • [13] Incremental Machine Learning Techniques for Document Layout Understanding
    Ferilli, S.
    Biba, M.
    Basile, T. M. A.
    Esposito, F.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3691 - 3694
  • [14] Historical Arabic Images Classification and Retrieval Using Siamese Deep Learning Model
    Khayyat, Manal M.
    Elrefaei, Lamiaa A.
    Khayyat, Mashael M.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (01): : 2109 - 2125
  • [15] Hybrid Feature Selection for Historical Document Layout Analysis
    Wei, Hao
    Chen, Kai
    Ingold, Rolf
    Liwicki, Marcus
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 87 - 92
  • [16] Investigation of Feature Selection for Historical Document Layout Analysis
    Wei, Hao
    Chen, Kai
    Nicolaou, Anguelos
    Liwicki, Marcus
    Ingold, Rolf
    2014 4TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2014, : 215 - 220
  • [17] Document Writer Analysis with Rejection for Historical Arabic Manuscripts
    Fecker, Daniel
    Asi, Abedelkadir
    Pantke, Werner
    Maergner, Volker
    El-Sana, Jihad
    Fingscheidt, Tim
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 743 - 748
  • [18] Open Evaluation Tool for Layout Analysis of Document Images
    Alberti, Michele
    Bouillon, Manuel
    Ingold, Rolf
    Liwicki, Marcus
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 4, 2017, : 43 - 47
  • [19] Document Layout Analysis with Deep Learning and Heuristics
    Rezanezhad, Vahid
    Baierer, Konstantin
    Gerber, Mike
    Labusch, Kai
    Neudecker, Clemens
    PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 73 - 78
  • [20] Evaluating sentiment analysis for Arabic Tweets using machine learning and deep learning
    Alshutayri, Areej
    Alamoudi, Huda
    Alshehri, Boushra
    Aldhahri, Eman
    Alsaleh, Iqbal
    Aljojo, Nahla
    Alghoson, Abdullah
    ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2022, 32 (04): : 7 - 18