Cross-domain document layout analysis using document style guide

被引:0
|
作者
Wu, Xingjiao [1 ,2 ]
Xiao, Luwei [2 ,3 ]
Du, Xiangcheng [1 ,4 ]
Zheng, Yingbin [4 ]
Li, Xin
Ma, Tianlong [1 ,2 ,3 ]
Jin, Cheng [1 ]
He, Liang [2 ,3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, Shanghai 200062, Peoples R China
[3] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[4] Videt Lab, Shanghai 201203, Peoples R China
关键词
Data generation; Document layout analysis; Deep learning; Document cross-domain analysis;
D O I
10.1016/j.eswa.2023.123039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document layout analysis (DLA) is a crucial computer vision task that involves partitioning document images into high-level semantic regions such as figures, tables, backgrounds, and texts. Deep learning models for DLA typically require a large amount of labeled data, which can be expensive. Though some researchers use generated data for training, a substantial style gap exists between the generated and target data. Moreover, it is necessary to improve the quality of the generated samples to achieve better control. To address these challenges, we propose a cross-domain DLA framework called DL-DSG, which leverages documentstyle guidance. DL-DSG comprises three components: the document layout generator (DLG) responsible for generating document element locations, the document element decorator (DED) for filling the elements, and the document style discriminator (DSD) for style guidance. In addition to generating controlled documents, we also focus on bridging the gap between the generated and target samples. To this end, we introduce a novel strategy that transforms document style judgment into the document cross-domain style guidance component. We evaluate the effectiveness of DL-DSG on popular DLA datasets, including PubLayNet, DSSE-200, CS-150, and CDSSE, and demonstrate its superior performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Visual Detection with Context for Document Layout Analysis
    Soto, Carlos X.
    Yoo, Shinjae
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3464 - 3470
  • [42] BINYAS: a complex document layout analysis system
    Bhowmik, Showmik
    Kundu, Soumyadeep
    Sarkar, Ram
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 8471 - 8504
  • [43] Semantic Document Layout Analysis of Handwritten Manuscripts
    Jaha, Emad Sami
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 2805 - 2831
  • [44] Document Layout Analysis with Deep Learning and Heuristics
    Rezanezhad, Vahid
    Baierer, Konstantin
    Gerber, Mike
    Labusch, Kai
    Neudecker, Clemens
    PROCEEDINGS OF THE 2023 INTERNATIONAL WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING, HIP 2023, 2023, : 73 - 78
  • [45] UnSupDLA: Towards Unsupervised Document Layout Analysis
    Sheikh, Talha Uddin
    Shehzadi, Tahira
    Hashmi, Khurram Azeem
    Stricker, Didier
    Afzal, Muhammad Zeshan
    DOCUMENT ANALYSIS SYSTEMS, DAS 2024, 2024, 14994 : 142 - 161
  • [46] DOCUMENT LAYOUT ANALYSIS VIA POSITIONAL ENCODING
    Zhou, Ejian
    Wu, Xingjiao
    Xiao, Luwei
    Du, Xiangcheng
    Ma, Tianlong
    He, Liang
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1156 - 1160
  • [47] Vision Grid Transformer for Document Layout Analysis
    Da, Cheng
    Luo, Chuwei
    Zheng, Qi
    Yao, Cong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19405 - 19415
  • [48] Segmentation for document layout analysis: not dead yet
    Markewich, Logan
    Zhang, Hao
    Xing, Yubin
    Lambert-Shirzad, Navid
    Jiang, Zhexin
    Lee, Roy Ka-Wei
    Li, Zhi
    Ko, Seok-Bum
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (02) : 67 - 77
  • [49] Document Layout Analysis for Semantic Information Extraction
    Adrian, Weronika T.
    Leone, Nicola
    Manna, Marco
    Marte, Cinzia
    AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 269 - 281
  • [50] Visual similarity based document layout analysis
    Wen, Di
    Ding, Xiao-Qing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2006, 21 (03) : 459 - 465