Layout Analysis for Arabic Historical Document Images Using Machine Learning

被引:43
|
作者
Bukhari, Syed Saqib [1 ]
Breuel, Thomas M. [1 ]
Asi, Abedelkadir [2 ]
El-Sana, Jihad [2 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
[2] Ben Gurion Univ Negev, Negev, Israel
基金
以色列科学基金会;
关键词
SEGMENTATION;
D O I
10.1109/ICFHR.2012.227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multi-layer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
引用
收藏
页码:639 / 644
页数:6
相关论文
共 50 条
  • [1] High Performance Layout Analysis of Arabic and Urdu Document Images
    Bukhari, Syed Saqib
    Shafait, Faisal
    Breuel, Thomas M.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1275 - 1279
  • [2] Arabic document layout analysis
    Amany M. Hesham
    Mohsen A. A. Rashwan
    Hassanin M. Al-Barhamtoshy
    Sherif M. Abdou
    Amr A. Badr
    Ibrahim Farag
    Pattern Analysis and Applications, 2017, 20 : 1275 - 1287
  • [3] Arabic document layout analysis
    Hesham, Amany M.
    Rashwan, Mohsen A. A.
    Al-Barhamtoshy, Hassanin M.
    Abdou, Sherif M.
    Badr, Amr A.
    Farag, Ibrahim
    PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (04) : 1275 - 1287
  • [4] Historical Document Layout Analysis Competition
    Antonacopoulos, A.
    Clausner, C.
    Papadopoulos, C.
    Pletschacher, S.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1516 - 1520
  • [5] Adaptive layout analysis of document images
    Malerba, D
    Esposito, F
    Altamura, O
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2002, 2366 : 526 - 534
  • [6] Layout analysis of urdu document images
    Shafait, Faisal
    Adnan-ul-Hasan
    Keysers, Daniel
    Breuel, Thomas M.
    10TH IEEE INTERNATIONAL MULTITOPIC CONFERENCE 2006, PROCEEDINGS, 2006, : 293 - +
  • [7] A Hybrid Approach for Document Layout Analysis in Document Images
    Shehzadi, Tahira
    Stricker, Didier
    Afzal, Muhammad Zeshan
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 21 - 39
  • [8] Correcting the document layout: A machine learning approach
    Malerba, D
    Esposito, F
    Altamura, O
    Ceci, M
    Berardi, M
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 97 - 102
  • [9] Music Document Layout Analysis through Machine Learning and Human Feedback
    Calvo-Zaragoza, Jorge
    Zhang, Ke
    Saleh, Zeyad
    Vigliensoni, Gabriel
    Fujinaga, Ichiro
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 2, 2017, : 23 - 24
  • [10] Historical document layout analysis using anisotropic diffusion and geometric features
    BinMakhashen, Galal M.
    Mahmoud, Sabri A.
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2020, 21 (03) : 329 - 342