Cross-domain document layout analysis using document style guide

被引:0
|
作者
Wu, Xingjiao [1 ,2 ]
Xiao, Luwei [2 ,3 ]
Du, Xiangcheng [1 ,4 ]
Zheng, Yingbin [4 ]
Li, Xin
Ma, Tianlong [1 ,2 ,3 ]
Jin, Cheng [1 ]
He, Liang [2 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, Shanghai 200062, Peoples R China
[3] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[4] Videt Lab, Shanghai 201203, Peoples R China
关键词
Data generation; Document layout analysis; Deep learning; Document cross-domain analysis;
D O I
10.1016/j.eswa.2023.123039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document layout analysis (DLA) is a crucial computer vision task that involves partitioning document images into high-level semantic regions such as figures, tables, backgrounds, and texts. Deep learning models for DLA typically require a large amount of labeled data, which can be expensive. Though some researchers use generated data for training, a substantial style gap exists between the generated and target data. Moreover, it is necessary to improve the quality of the generated samples to achieve better control. To address these challenges, we propose a cross-domain DLA framework called DL-DSG, which leverages documentstyle guidance. DL-DSG comprises three components: the document layout generator (DLG) responsible for generating document element locations, the document element decorator (DED) for filling the elements, and the document style discriminator (DSD) for style guidance. In addition to generating controlled documents, we also focus on bridging the gap between the generated and target samples. To this end, we introduce a novel strategy that transforms document style judgment into the document cross-domain style guidance component. We evaluate the effectiveness of DL-DSG on popular DLA datasets, including PubLayNet, DSSE-200, CS-150, and CDSSE, and demonstrate its superior performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Document page segmentation and layout analysis using soft ordering
    Mitchell, PE
    Yan, H
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 458 - 461
  • [32] Chinese document layout analysis using an adaptive regrouping strategy
    Chang, F
    Chu, SY
    Chen, CY
    PATTERN RECOGNITION, 2005, 38 (02) : 261 - 271
  • [33] Document page layout analysis using Harris corner points
    Nourbakhsh, Farshad
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSSING, PROCEEDINGS, 2006, : 149 - +
  • [34] Font Style Transfer Using Neural Style Transfer and Unsupervised Cross-domain Transfer
    Narusawa, Atsushi
    Shimoda, Wataru
    Yanai, Keiji
    COMPUTER VISION - ACCV 2018 WORKSHOPS, 2019, 11367 : 100 - 109
  • [35] Cross-Domain NER using Cross-Domain Language Modeling
    Jia, Chen
    Liang, Xiaobo
    Zhang, Yue
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2464 - 2474
  • [36] Document layout extraction using soft ordering
    Mitchell, PE
    Yan, H
    OPTICAL ENGINEERING, 2002, 41 (11) : 2831 - 2843
  • [37] Document Layout Analyze Using Hierarchical Processing
    Boiangiu, Costin-Anton
    Cananau, Dan-Cristian
    Bucur, Ion
    PROCEEDINGS OF THE 1ST WSEAS INTERNATIONAL CONFERENCE ON VISUALIZATION, IMAGING AND SIMULATION (VIS'08), 2008, : 72 - 76
  • [38] Visual Similarity Based Document Layout Analysis
    Di Wen
    Xiao-Qing Ding
    Journal of Computer Science and Technology, 2006, 21 : 459 - 465
  • [39] Document layout analysis based on emergent computation
    Ishitani, Y
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 45 - 50
  • [40] Segmentation for document layout analysis: not dead yet
    Logan Markewich
    Hao Zhang
    Yubin Xing
    Navid Lambert-Shirzad
    Zhexin Jiang
    Roy Ka-Wei Lee
    Zhi Li
    Seok-Bum Ko
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 67 - 77