Layout Analysis for Arabic Historical Document Images Using Machine Learning

被引:43
|
作者
Bukhari, Syed Saqib [1 ]
Breuel, Thomas M. [1 ]
Asi, Abedelkadir [2 ]
El-Sana, Jihad [2 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
[2] Ben Gurion Univ Negev, Negev, Israel
基金
以色列科学基金会;
关键词
SEGMENTATION;
D O I
10.1109/ICFHR.2012.227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multi-layer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
引用
收藏
页码:639 / 644
页数:6
相关论文
共 50 条
  • [21] High Performance Layout Analysis of Medieval European Document Images
    Bukhari, Syed Saqib
    Gupta, Ashutosh
    Tiwari, Anil Kumar
    Dengel, Andreas
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM 2018), 2018, : 324 - 331
  • [22] Word spotting in Chinese document images without layout analysis
    Lu, Y
    Tan, CL
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 57 - 60
  • [23] Text Detection in Document Images by Machine Learning Algorithms
    Zelenika, Darko
    Povh, Janez
    Zenko, Bernard
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 169 - 179
  • [24] Document Layout Analysis using Multigaussian Fitting
    Melinda, Laiphangbam
    Ghanapuram, Raghu
    Bhagvati, Chakravarthy
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 747 - 752
  • [25] Twitter Arabic Sentiment Analysis to Detect Depression Using Machine Learning
    Musleh, Dhiaa A.
    Alkhales, Taef A.
    Almakki, Reem A.
    Alnajim, Shahad E.
    Almarshad, Shaden K.
    Alhasaniah, Rana S.
    Aljameel, Sumayh S.
    Almuqhim, Abdullah A.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 3463 - 3477
  • [26] Sentiment Analysis for Arabic Reviews in Social Networks Using Machine Learning
    Hammad, Mustafa
    Al-awadi, Mouhammd
    INFORMATION TECHNOLOGY: NEW GENERATIONS, 2016, 448 : 131 - 139
  • [27] Sentiment Analysis for Arabic Reviews using Machine Learning Classification Algorithms
    Sayed, Awny A.
    Elgeldawi, Enas
    Zaki, Alaa M.
    Galal, Ahmed R.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 56 - 63
  • [28] Using Machine Learning to Automate Mammogram Images Analysis
    Tang, Xuejiao
    Zhang, Liuhua
    Zhang, Wenbin
    Huang, Xin
    Iosifidis, Vasileios
    Liu, Zhen
    Zhang, Mingli
    Messina, Enza
    Zhang, Ji
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 757 - 764
  • [29] Historical Document Digitization through Layout Analysis and Deep Content Classification
    Corbelli, Andrea
    Baraldi, Lorenzo
    Grana, Costantino
    Cucchiara, Rita
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 4077 - 4082
  • [30] Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization
    Ma, Weihong
    Zhang, Hesuo
    Jin, Lianwen
    Wu, Sihang
    Wang, Jiapeng
    Wang, Yongpan
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 31 - 36