A machine-learning approach for analyzing document layout structures with two reading orders

被引:9
|
作者
Wu, Chung-Chih [1 ]
Chou, Chien-Hsing [1 ]
Chang, Fu [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
binary decision; document layout analysis; reading order; support vector machine; taboo box; textline; text region;
D O I
10.1016/j.patcog.2008.03.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of document layout analysis is to locate textlines and text regions in document images mostly via a series of split-or-merge operations. Before applying such an operation, however, it is necessary to examine the context to decide whether the place chosen for the operation is appropriate. We thus view document layout analysis as a matter of solving a series of binary decision problems, such as whether to apply, or not to apply, a split-or-merge operation to a chosen place. To solve these problems, we use support vector machines to learn whether OF not to apply the previously mentioned operations from training documents in which all textlines and text regions have been located and their identifies labeled. The proposed approach is very effective for analyzing documents that allow both horizontal and vertical reading orders. When applied to a test data set composed of eight types of layout structure, the approach's accuracy rates for identifying textlines and text regions are 98.83% and 96.72%, respectively. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3200 / 3213
页数:14
相关论文
共 50 条
  • [41] Music Document Layout Analysis through Machine Learning and Human Feedback
    Calvo-Zaragoza, Jorge
    Zhang, Ke
    Saleh, Zeyad
    Vigliensoni, Gabriel
    Fujinaga, Ichiro
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 2, 2017, : 23 - 24
  • [42] Analyzing domain features of small proteins using a machine-learning method
    Ding, Shijian
    Liao, Huiping
    Huang, Feiming
    Chen, Lei
    Guo, Wei
    Feng, Kaiyan
    Huang, Tao
    Cai, Yu-Dong
    PROTEOMICS, 2024, 24 (16)
  • [43] Two-Phase Machine Learning Approach for Extractive Single Document Summarization
    Priya, A. R. Manju
    Gupta, Deepa
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 871 - 881
  • [44] A Machine-learning based Unbiased Phishing Detection Approach
    Shirazi, Hossein
    Zweigle, Landon
    Ray, Indrakshi
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS (SECRYPT), VOL 1, 2020, : 423 - 430
  • [45] Automotive Feature Coordination based on a Machine-Learning Approach
    Dominka, Sven
    Tabrizi, Sarah
    Mandl, Michael
    Duebner, Michael
    2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 726 - 731
  • [46] Machine-learning approach improves deepwater facility uptime
    Singh, Ajay
    Sankaran, Sathish
    Ambre, Sachin
    1600, Society of Petroleum Engineers (SPE) (72): : 54 - 55
  • [47] A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation
    Fatemeh Ghadiri
    Abbas Ali Husseini
    Oğuzhan Öztaş
    Indian Journal of Gastroenterology, 2022, 41 : 475 - 482
  • [48] Prediction of Nucleophilicity and Electrophilicity Based on a Machine-Learning Approach
    Liu, Yidi
    Yang, Qi
    Cheng, Junjie
    Zhang, Long
    Luo, Sanzhong
    Cheng, Jin-Pei
    CHEMPHYSCHEM, 2023, 24 (14)
  • [49] A machine-learning approach to multi-robot coordination
    Wang, Ying
    de Silva, Clarence W.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2008, 21 (03) : 470 - 484
  • [50] Interactive Reconstructive Student Modeling: A Machine-Learning Approach
    International Journal of Human-Computer Interaction, 7 (04):