A machine-learning approach for analyzing document layout structures with two reading orders

被引:9
|
作者
Wu, Chung-Chih [1 ]
Chou, Chien-Hsing [1 ]
Chang, Fu [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
binary decision; document layout analysis; reading order; support vector machine; taboo box; textline; text region;
D O I
10.1016/j.patcog.2008.03.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of document layout analysis is to locate textlines and text regions in document images mostly via a series of split-or-merge operations. Before applying such an operation, however, it is necessary to examine the context to decide whether the place chosen for the operation is appropriate. We thus view document layout analysis as a matter of solving a series of binary decision problems, such as whether to apply, or not to apply, a split-or-merge operation to a chosen place. To solve these problems, we use support vector machines to learn whether OF not to apply the previously mentioned operations from training documents in which all textlines and text regions have been located and their identifies labeled. The proposed approach is very effective for analyzing documents that allow both horizontal and vertical reading orders. When applied to a test data set composed of eight types of layout structure, the approach's accuracy rates for identifying textlines and text regions are 98.83% and 96.72%, respectively. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3200 / 3213
页数:14
相关论文
共 50 条
  • [31] A Machine-Learning Approach to Autonomous Music Composition
    Lichtenwalter, Ryan
    Lichtenwalter, Katerina
    Chawla, Nitesh
    JOURNAL OF INTELLIGENT SYSTEMS, 2010, 19 (02) : 95 - 123
  • [32] Machine-learning Approach to Microbial Colony Localisation
    Michal, Cicatka
    Radim, Burget
    Jan, Karasek
    2022 45TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING, TSP, 2022, : 206 - 211
  • [33] A Machine Learning Approach for Layout Inference in Spreadsheets
    Koci, Elvis
    Thiele, Maik
    Romero, Oscar
    Lehner, Wolfgang
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 77 - 88
  • [34] Machine-learning approach to holographic particle characterization
    1600, OSA - The Optical Society (22):
  • [35] A machine-learning approach to predict postprandial hypoglycemia
    Wonju Seo
    You-Bin Lee
    Seunghyun Lee
    Sang-Man Jin
    Sung-Min Park
    BMC Medical Informatics and Decision Making, 19
  • [36] Machine-learning approach identifies wolfcamp reservoirs
    Carpenter C.
    JPT, Journal of Petroleum Technology, 2019, 71 (03): : 87 - 89
  • [37] Machine-Learning Prediction of Underwater Shock Loading on Structures
    Zhang, Mou
    Drikakis, Dimitris
    Li, Lei
    Yan, Xiu
    COMPUTATION, 2019, 7 (04)
  • [38] Machine-Learning Informed Representations for Grain Boundary Structures
    Homer, Eric R.
    Hensley, Derek M.
    Rosenbrock, Conrad W.
    Nguyen, Andrew H.
    Hart, Gus L. W.
    FRONTIERS IN MATERIALS, 2019, 6
  • [39] Using Machine-Learning for the Damage Detection of Harbour Structures
    Hake, Frederic
    Goettert, Leonard
    Neumann, Ingo
    Alkhatib, Hamza
    REMOTE SENSING, 2022, 14 (11)
  • [40] Layout Analysis for Arabic Historical Document Images Using Machine Learning
    Bukhari, Syed Saqib
    Breuel, Thomas M.
    Asi, Abedelkadir
    El-Sana, Jihad
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 639 - 644