Cross-domain document layout analysis using document style guide

被引:0
|
作者
Wu, Xingjiao [1 ,2 ]
Xiao, Luwei [2 ,3 ]
Du, Xiangcheng [1 ,4 ]
Zheng, Yingbin [4 ]
Li, Xin
Ma, Tianlong [1 ,2 ,3 ]
Jin, Cheng [1 ]
He, Liang [2 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, Shanghai 200062, Peoples R China
[3] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[4] Videt Lab, Shanghai 201203, Peoples R China
关键词
Data generation; Document layout analysis; Deep learning; Document cross-domain analysis;
D O I
10.1016/j.eswa.2023.123039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document layout analysis (DLA) is a crucial computer vision task that involves partitioning document images into high-level semantic regions such as figures, tables, backgrounds, and texts. Deep learning models for DLA typically require a large amount of labeled data, which can be expensive. Though some researchers use generated data for training, a substantial style gap exists between the generated and target data. Moreover, it is necessary to improve the quality of the generated samples to achieve better control. To address these challenges, we propose a cross-domain DLA framework called DL-DSG, which leverages documentstyle guidance. DL-DSG comprises three components: the document layout generator (DLG) responsible for generating document element locations, the document element decorator (DED) for filling the elements, and the document style discriminator (DSD) for style guidance. In addition to generating controlled documents, we also focus on bridging the gap between the generated and target samples. To this end, we introduce a novel strategy that transforms document style judgment into the document cross-domain style guidance component. We evaluate the effectiveness of DL-DSG on popular DLA datasets, including PubLayNet, DSSE-200, CS-150, and CDSSE, and demonstrate its superior performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Layout analysis of urdu document images
    Shafait, Faisal
    Adnan-ul-Hasan
    Keysers, Daniel
    Breuel, Thomas M.
    10TH IEEE INTERNATIONAL MULTITOPIC CONFERENCE 2006, PROCEEDINGS, 2006, : 293 - +
  • [22] Local Descriptors for Document Layout Analysis
    Garz, Angelika
    Diem, Markus
    Sablatnig, Robert
    ADVANCES IN VISUAL COMPUTING, PT III, 2010, 6455 : 29 - 38
  • [23] THE DOCUMENT SPECTRUM FOR PAGE LAYOUT ANALYSIS
    OGORMAN, L
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (11) : 1162 - 1173
  • [24] Adaptive layout analysis of document images
    Malerba, D
    Esposito, F
    Altamura, O
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2002, 2366 : 526 - 534
  • [25] Document Layout Analysis: A Comprehensive Survey
    Binmakhashen, Galal M.
    Mahmoud, Sabri A.
    ACM COMPUTING SURVEYS, 2020, 52 (06)
  • [26] Historical Document Layout Analysis Competition
    Antonacopoulos, A.
    Clausner, C.
    Papadopoulos, C.
    Pletschacher, S.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1516 - 1520
  • [27] Document Reconstruction by Layout Analysis of Snippets
    Kleber, Florian
    Diem, Markus
    Sablatnig, Robert
    COMPUTER VISION AND IMAGE ANALYSIS OF ART, 2010, 7531
  • [28] A Methodological Study of Document Layout Analysis
    Zhang, Chunhu
    Ibrayim, Mayire
    Hamdulla, Askar
    2022 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY, HUMAN-COMPUTER INTERACTION AND ARTIFICIAL INTELLIGENCE, VRHCIAI, 2022, : 12 - 17
  • [29] Comparative Semantic Document Layout Analysis for Enhanced Document Image Retrieval
    Jaha, Emad Sami
    IEEE ACCESS, 2024, 12 : 150451 - 150467
  • [30] Document page segmentation and layout analysis using soft ordering
    Mitchell, Phillip E.
    Yan, Hong
    Proceedings - International Conference on Pattern Recognition, 2000, 1 : 458 - 461