A machine-learning approach for analyzing document layout structures with two reading orders

被引:9
|
作者
Wu, Chung-Chih [1 ]
Chou, Chien-Hsing [1 ]
Chang, Fu [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
binary decision; document layout analysis; reading order; support vector machine; taboo box; textline; text region;
D O I
10.1016/j.patcog.2008.03.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of document layout analysis is to locate textlines and text regions in document images mostly via a series of split-or-merge operations. Before applying such an operation, however, it is necessary to examine the context to decide whether the place chosen for the operation is appropriate. We thus view document layout analysis as a matter of solving a series of binary decision problems, such as whether to apply, or not to apply, a split-or-merge operation to a chosen place. To solve these problems, we use support vector machines to learn whether OF not to apply the previously mentioned operations from training documents in which all textlines and text regions have been located and their identifies labeled. The proposed approach is very effective for analyzing documents that allow both horizontal and vertical reading orders. When applied to a test data set composed of eight types of layout structure, the approach's accuracy rates for identifying textlines and text regions are 98.83% and 96.72%, respectively. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3200 / 3213
页数:14
相关论文
共 50 条
  • [21] MACE: A Machine-learning Approach to Chemistry Emulation
    Maes, Silke
    De Ceuster, Frederik
    van de Sande, Marie
    Decin, Leen
    ASTROPHYSICAL JOURNAL, 2024, 969 (02):
  • [22] A machine-learning approach to predict postprandial hypoglycemia
    Seo, Wonju
    Lee, You-Bin
    Lee, Seunghyun
    Jin, Sang-Man
    Park, Sung-Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [23] Machine-learning approach for discovery of conventional superconductors
    Tran, Huan
    Vu, Tuoc N.
    PHYSICAL REVIEW MATERIALS, 2023, 7 (05)
  • [24] Accurate prediction of grain boundary structures and energetics in CdTe: a machine-learning potential approach
    Yokoi, Tatsuya
    Adachi, Kosuke
    Iwase, Sayuri
    Matsunaga, Katsuyuki
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2022, 24 (03) : 1620 - 1629
  • [25] A Machine-learning Approach to Enhancing eROSITA Observations
    Soltis, John
    Ntampaka, Michelle
    Wu, John F.
    ZuHone, John
    Evrard, August
    Farahi, Arya
    Ho, Matthew
    Nagai, Daisuke
    ASTROPHYSICAL JOURNAL, 2022, 940 (01):
  • [26] Forecasting client retention - A machine-learning approach
    Elisa Schaeffer, Satu
    Rodriguez Sanchez, Sara Veronica
    JOURNAL OF RETAILING AND CONSUMER SERVICES, 2020, 52
  • [27] A machine-learning approach to ranking RDF properties
    Dessi, Andrea
    Atzori, Maurizio
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 366 - 377
  • [28] A machine-learning approach to a mobility policy proposal
    Shulajkovska, Miljana
    Smerkol, Maj
    Dovgan, Erik
    Gams, Matjaz
    HELIYON, 2023, 9 (10)
  • [29] A machine-learning approach to optimal bid pricing
    Lawrence, RD
    COMPUTATIONAL MODELING AND PROBLEM SOLVING IN THE NETWORKED WORLD: INTERFACES IN COMPUTER SCIENCE AND OPERATIONS RESEARCH, 2002, 21 : 97 - 118
  • [30] Examining the radius valley: a machine-learning approach
    MacDonald, Mariah G.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2019, 487 (04) : 5062 - 5069